Supported Metrics¶

Utility¶

Utility metrics are defined by the literal fairical.metrics.UtilityMetricsType. The column Objective indicates if the value needs to be /maximized (the higher, the better) or /minimized (the lower, the better). The column Threshold indicates if the metric needs to be probed on a specific threshold (/yes, requires thresholding; /no). The column named Exclusive indicates if this utility metric can be used alone in system evaluations (/yes, can be used exclusively; /no, cannot). If not, one should ensure to pick two (or more metrics) containing from both negative and positive samples.

Metric	Range	Description
`fpr`	\([0,1]\)	False positive rate
`tpr`	\([0,1]\)	True positive rate (a.k.a. Recall)
`tnr`	\([0,1]\)	True negative rate
`fnr`	\([0,1]\)	False negative rate
`roc_auc`	\([0,1]\)	Area under the ROC curve (FPR x TPR)
`prec`	\([0,1]\)	Precision
`rec`	\([0,1]\)	Recall (a.k.a. True positive rate)
`avg_prec`	\([0,1]\)	Average Precision (Area under the PR curve)
`f1`	\([0,1]\)	F1-score
`acc`	\([0,1]\)	Accuracy
`bal_acc`	\([0,1]\)	Balanced Accuracy

Implementation of these metrics rely mostly on the scikit-learn toolkit (Pedregosa et al. [PVG+11]).

Fairness¶

Fairness metrics are defined by literals fairical.metrics.FairnessMetricsType and fairical.metrics.MinMaxFairnessMetricsType. The first literal type includes fairness metrics which are parameterised only by a protected attribute (such as age, or gender). The second class of fairness metrics correspond to min-max criteria comparing specific utility metrics types (see above) between protected groups. It therefore requires two parameters: The utility metric (as per table above), and a protected attribute. Separate parameters of a metric using the + (plus sign). Examples are provided on the next table. The column Threshold indicates if the metric needs to be probed on a specific threshold (the value /depends is attributed to fairness metrics in which thresholding depends on the chosen utility metric). In such cases, use the table above to determine if thresholding of scores is necessary. The column Objective indicates if the value needs to be /maximized (the higher, the better) or /minimized (the lower, the better).

Metric	Parameters	Example	Range	Description
`dpd`	attribute	`dpd+age`	\([0,1]\)	Demographic parity difference
`dpr`	attribute	`dpr+age`	\([0,1]\)	Demographic parity ratio
`eod`	attribute	`eod+age`	\([0,1]\)	Equalized odds difference
`eor`	attribute	`eod+age`	\([0,1]\)	Equalized odds ratio
`minmaxd`	attr., util.	`minmaxd+acc+age`	\([0,1]\)	Min-Max difference
`minmaxr`	attr., util.	`minmaxr+acc+age`	\([0,1]\)	Min-Max ratio

Implementation of these metrics rely on the fairlearn toolkit (Weerts et al. [WDE+23]).