Supported Metrics

Utility

Utility metrics are defined by the literal fairical.metrics.UtilityMetricsType. The column Objective indicates if the value needs to be /maximized (the higher, the better) or /minimized (the lower, the better). The column Threshold indicates if the metric needs to be probed on a specific threshold (/yes, requires thresholding; /no). The column named Exclusive indicates if this utility metric can be used alone in system evaluations (/yes, can be used exclusively; /no, cannot). If not, one should ensure to pick two (or more metrics) containing from both negative and positive samples.

Metric

Range

Objective

Threshold

Exclusive

Description

fpr

\([0,1]\)

False positive rate

tpr

\([0,1]\)

True positive rate (a.k.a. Recall)

tnr

\([0,1]\)

True negative rate

fnr

\([0,1]\)

False negative rate

roc_auc

\([0,1]\)

Area under the ROC curve (FPR x TPR)

prec

\([0,1]\)

Precision

rec

\([0,1]\)

Recall (a.k.a. True positive rate)

avg_prec

\([0,1]\)

Average Precision (Area under the PR curve)

f1

\([0,1]\)

F1-score

acc

\([0,1]\)

Accuracy

bal_acc

\([0,1]\)

Balanced Accuracy

Implementation of these metrics rely mostly on the scikit-learn toolkit (Pedregosa et al. [PVG+11]).

Fairness

Fairness metrics are defined by literals fairical.metrics.FairnessMetricsType and fairical.metrics.MinMaxFairnessMetricsType. The first literal type includes fairness metrics which are parameterised only by a protected attribute (such as age, or gender). The second class of fairness metrics correspond to min-max criteria comparing specific utility metrics types (see above) between protected groups. It therefore requires two parameters: The utility metric (as per table above), and a protected attribute. Separate parameters of a metric using the + (plus sign). Examples are provided on the next table. The column Threshold indicates if the metric needs to be probed on a specific threshold (the value /depends is attributed to fairness metrics in which thresholding depends on the chosen utility metric). In such cases, use the table above to determine if thresholding of scores is necessary. The column Objective indicates if the value needs to be /maximized (the higher, the better) or /minimized (the lower, the better).

Metric

Parameters

Example

Range

Objective

Threshold

Description

dpd

attribute

dpd+age

\([0,1]\)

Demographic parity difference

dpr

attribute

dpr+age

\([0,1]\)

Demographic parity ratio

eod

attribute

eod+age

\([0,1]\)

Equalized odds difference

eor

attribute

eod+age

\([0,1]\)

Equalized odds ratio

minmaxd

attr., util.

minmaxd+acc+age

\([0,1]\)

Min-Max difference

minmaxr

attr., util.

minmaxr+acc+age

\([0,1]\)

Min-Max ratio

Implementation of these metrics rely on the fairlearn toolkit (Weerts et al. [WDE+23]).