fairical.solutions

Define the basic solution data model and functionality.

Classes

Solutions(**data)

Data model representing system solutions (or utility/fairness trade-offs).

class fairical.solutions.Solutions(**data)[source]

Bases: BaseModel

Data model representing system solutions (or utility/fairness trade-offs).

Objects of this type carry information about two or more performance metrics (utility or fairness) for each operating mode (utility/fairness trade-off) of the analysed ML system.

It is a dictionary where keys correspond to utility or fairness metrics calculated for the whole system, and values across different keys represent each the performance at a particular operating mode (utility/fairness trade-off) the system being analysed can potentially implement.

points: dict[str, list[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0.0), Le(le=1.0)])]]]
metadata: dict[str, Any]
n_metrics()

Return the number of metrics stored in the model.

Return type:

int

Returns:

The count of metric keys in the model.

n_solutions()[source]

Return the number of solutions stored in the model, across all metrics.

Return type:

int

Returns:

The number of solutions in the model, across all metrics.

items()[source]

Return a view of metric keys and their associated solution vectors.

Return type:

ItemsView[str, list[float]]

Returns:

A set-like view of (metric, vector) pairs.

keys()[source]

Return a view of the metric keys.

Return type:

KeysView[str]

Returns:

A set-like view of metric names.

values()[source]

Return a view of all solution vectors.

Return type:

ValuesView[list[float]]

Returns:

A view of all metric solution vectors in the model.

classmethod fromarray(points, metrics, thresholds, identifiers=None, identifier_names=None)[source]

Create a new instance from an array and names of metrics.

Parameters:
  • points (TypeAliasType) – 2-D array-like object with floating-point numbers organized as (n_solutions, n_metrics).

  • metrics (Sequence[str]) – A set of strictly valid and supported metrics, each representing the columns of the input points array.

  • thresholds (Sequence[float]) – A list of tresholds that were used to compute the points.

  • identifiers (Optional[Sequence[int]]) – Optional list of identifiers, indicating which model each point corresponds to. Each identifier indexes into identifier_names. If not specified, all points will have the same identifier.

  • identifier_names (Optional[Sequence[str]]) – Optoinal list of identifier names.

Return type:

Self

Returns:

A newly created and validated object.

Raises:

AssertionError – If the number of columns on the input array-like object is different than the number of listed metrics.

classmethod load(source)[source]

Validate and load a JSON file into a solution data object.

This function is intended to validate and load the input in JSON format. It opens the given file path, parses its JSON content, and validates it against the defined model.

Parameters:

source (Path | str | TextIO) – Source input where to read JSON from.

Return type:

Self

Returns:

Parsed and validated content as a Solutions instance.

Raises:

pydantic_core.ValidationError – If the file contains invalid data.

save(dest, **args)[source]

Save contents to an external file.

Parameters:
Return type:

None

check_metrics_validity()[source]

Ensure all metrics are valid.

Return type:

Self

check_identifiers_specified()[source]

Ensure solutions contain identifiers.

Return type:

Self

check_consistent_lengths()[source]

Ensure all solution lists have the same length.

Return type:

Self

filter_metadata_by_indices(indices)[source]

Filter metadata from provided indices.

Parameters:

indices – The indices used for filtering.

Returns:

Filtered metadata.

deduplicate(eps=1e-06)[source]

Filter solutions to remove duplicates within a certain epsilon.

Remove points in these solutions that lie within eps of another by clustering with sklearn.cluster.DBSCAN (min_samples=1) and keeping the first point in each cluster.

Parameters:

eps (float) – Maximum distance between points in the same cluster.

Return type:

Self

Returns:

Filtered solutions without duplicates, as a new object.

non_dominated_solutions()[source]

Filter solutions from system that are non-dominated.

This is a thin wrapper around pymoo.util.nds.NonDominatedSorting that extracts the rank‑0 solutions (those that are not dominated by any other).

Definition: A point p is dominated only if one single competitor is no worse in every objective and strictly better in at least one.

Parameters:

solutions – All solutions available in the current system.

Return type:

tuple[Self, Self]

Returns:

A tuple containing non-dominated and dominated solutions respectively. By definition, the sets are guaranteed to not overlap.

indicators()[source]

Assess utility-fairness trade-off systems based on characteristics of the estimated Pareto front.

This method evaluates trade-off between utiltiy and fairness of adjustable systems by using Multi-Objective based performance indicators. It first estimates the set of non-dominated solutions.

Return type:

dict[Literal['hv', 'ud', 'os', 'as', 'onvg', 'onvgr', 'relative-onvg', 'area'], float]

Returns:

A dictionary that characterizes the (estimated Pareto) front composed of non-dominated solutions in nds. The dictionary contains the following keys:

  • hv: The hypervolume of the front.

    Higher is better. This indicator evaluates how the solution set covers the metric space in terms of diversity and proximity to the ideal. HV is formulated as:

    \[\begin{split}HV(S) = VOL\left(\bigcup_{\substack{x \in S \\ x \prec r}} \prod_{i=1}^{N}[x^{i},r^{i}]\right)\end{split}\]

    Where \(x\) is the solution set and \(r\) is the Nadir point

  • ud: Uniformity of the distribution of nds points on the front.

    This indicator evaluates how uniform the solution set is spanned in the metric space based on an upper-bound distance, \(\sigma\). UD is formulated as:

    \[UD(S,\sigma)=\frac{1}{1+D_{nc}(S, \sigma)}\]

    Where

    \[D_{nc}(S,\sigma)=\sqrt{\frac{1}{|X_n|-1} \sum_{i=1}^{|X_n|} \left(nc(x^i,\sigma)-\mu_{nc(x,\sigma)}\right)^2}\]

    and

    \[nc(x^i,\sigma)=|\{x \in X_n, \|x-x^i\|<\sigma\}|-1\]

    \(\sigma\) is the niche radius that is problem dependent and can be adjusted based on the distribution of the candidate solution in the space. \(\mu_{nc(x,\sigma)}\) is the mean of the niche counts, \(nc\), calculated as \(\mu_{nc(x,\sigma)}=\frac{1}{|X_n|} \sum_{j=1}^{|X_n|} nc(x^j,\sigma)\).

  • os: Overall spread of nds points with respect to extremities of the front.

    This indicator assesses how well the points from the candidate set spreads towards the ideal of the optimal PF. OS is formulated as:

    \[OS(S,\mathcal{P})=\prod_{i=1}^{N}\left|\frac{\max\limits_{s \in S}s_i-\min\limits_{s \in S}s_i}{\max\limits_{p \in \mathcal{P}}p_{i}-\min\limits_{p \in \mathcal{P}}p_{i}}\right|\]

    Where the nominator and denominator are the absolute difference between the worst and best points for the candidate solution \(S\) and Pareto optimal set \(\mathcal{P}\), respectively.

  • as: Average spread of nds points with respect to extremities of the front.

    This indicator assesses how well the points from the candidate set spreads towards the ideal of the optimal PF. AS is formulated as:

    \[AS(S,\mathcal{P})=\frac{1}{N}\sum_{i=1}^{N}\left|\frac{\max\limits_{s \in S}s_i-\min\limits_{s \in S}s_i}{\max\limits_{p \in \mathcal{P}}p_{i}-\min\limits_{p \in \mathcal{P}}p_{i}}\right|\]

    Where the nominator and denominator are the absolute difference between the worst and best points for the candidate solution \(S\) and Pareto optimal set \(\mathcal{P}\), respectively.

  • onvg: Overall Nondominated Vector Generation (ONVG) in the front (nds).

    Higher is better. This indicator evaluates how many optimal solutions are generated by the system. ONVG is formulated as:

    \[ONVG(S) = |X_n|\]

    Where \(|.|\) is the cardinality of the candidate solution set in the metric space.

  • onvgr: Ratio between number of solutions in nds and nds + ds.

    Higher is better. This indicator assesses the proportion of optimal solutions generated by the system. ONVGR is formulated as:

    \[ONVGR(S) = \left|\frac{X_n}{S}\right|\]

    Where \(|.|\) is the ratio of the optimality.

tabulate()[source]

Generate a table containing the given solutions.

Each table row contains the values for each metric, the threshold used to compute them, and the corresponding identifier name.

Return type:

str

Returns:

The generated table as a string.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].