fairical.scores¶
Data model organizing scores of ML systems under multi-objective constraints.
The model describes a data structure containing machine learning model scores, ground-truth labels, and sensitive attributes (e.g., race, gender).
Classes
|
Data model representing raw machine learning score outputs. |
- class fairical.scores.Scores(**data)[source]¶
Bases:
BaseModelData model representing raw machine learning score outputs.
It is composed of a set of scores, for one or more operating points (e.g. preference rays, or ratios between various optimisation objectives), ground-truth for the task being analyzed, as well as extra protected attributes that are relevant for, at least, demographic fairness analysis.
For the JSON representation, scores, ground-truth, and demographic attributes may be inlined or out-sourced to an external file where the data structure can be loaded from. Relative paths are considered w.r.t. the location of the current file.
- scores: list[list[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0.0), Le(le=1.0)])]] | Path]¶
Inline scores data or list of file paths. Each score must be a floating-point number between 0 and 1 inclusive.
- identifiers: list[str] | None¶
Optional inline identifiers corresponding to each system in scores. Each identifier must be a string.
- ground_truth: list[Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])]] | Path¶
Inline ground-truth data or a single file path. Each ground-truth label must be an integer with a minimum value of 0.
- attributes: dict[str, list[str | Annotated[int, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])]]] | Path¶
Inline attributes data or a single file path. It is setup as a dictionary mapping attribute names to lists of demographic data, which can be of type str, integer or floating-point.
- check_consistent_num_samples()[source]¶
Ensure all sample-level lists have the same length.
- Return type:
Self
- check_identifiers_specified()[source]¶
Generate default identifier per system if not specified.
- Return type:
Self
- check_consistent_num_subsystems()[source]¶
Ensure all subsystem-level lists have the same length.
- Return type:
Self
- classmethod load(source)[source]¶
Validate and load a JSON file into a raw data object.
This function is intended to validate and load the input in JSON format. It opens the given file path, parses its JSON content, and validates it against the defined data model.
- Parameters:
source (
Path|str|TextIO) – Source input where to read JSON from.- Return type:
Self- Returns:
Parsed and validated content as a
Scoresinstance.- Raises:
pydantic_core.ValidationError – If the file contains invalid data.
- save(dest, **args)[source]¶
Save contents to an external file.
- Parameters:
dest (
Path|str|TextIO) – Destination where to save the contents. If not a path or str, then assumed to have awritemethod accepting strings.args – Parameters further passed down to
pydantic.BaseModel.model_dump_json().
- Return type:
- solutions_a_posteriori(metrics, thresholds=None)[source]¶
Calculate all solutions of a system a posteriori, given metrics and thresholds.
This method retrieves solutions that can be implemented by systems. For each set of scores in
self.scores, it calculates all solutions of the system being analysed through simple thresholding, and then aggregates all solutions to construct all possible sets of solutions a system can implement.- Parameters:
metrics (
Sequence[str]) – Metrics types to consider when evaluating solutions. Example:eod+age,eod+gender, oracc.thresholds (
list[float] |None) – List of thresholds to apply as values within the interval \([0,1]\). If not provided, then uses scikit-learn to extract meaningful scores from the system.
- Return type:
- Returns:
All know solutions for the input system.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- solutions_a_priori(metrics, prior_solutions, dominated=None)[source]¶
Calculate all solutions of a system with settings (system and metrics) a priori.
This method retrieves solutions that can be implemented by systems. For each set of scores in
self.scores, it calculates all solutions of the system being analysed through simple thresholding, and then aggregates all solutions to construct all possible sets of solutions a system can implement.- Parameters:
metrics (
Sequence[str]) – Metrics types to consider when evaluating solutions. Example:eod+age,eod+gender, oracc.prior_solutions (
Solutions) – Solutions to use as the basis for the calculation of solutions with threshold and system picked a priori.A tri-state boolean flag that defines which subset of
prior_solutionsto apply to the scores:None: applies all solutions (default)True: applies only dominated solutionsFalse: applies only non-dominated solutions
- Return type:
- Returns:
All a priori known solutions for the input system.