credible.curves¶

Curve plotting support.

Code in this module expresses classic performance curves for system performance evaluation as sets of coordinates. Use the module plot to make graphical representations.

Functions

`area_under_the_curve`(curve)	Calculate the area under a curve using a trapezoidal rule.
`average_metric`(curve)	Calculate the area under a curve using a rectangle rule.
`curve_ci`(y_true, y_score, axes, ci_functor, ...)	Calculate points and confidence intervals of an arbitrary performance curve.
`curve_ci_hull`(curve[, extrapolate_from_origin])	Calculate lower and upper confidence intervals of a curve.
`estimated_ci_coverage`(ci_functor, rng[, n])	Return the approximate coverage of a credible region or confidence interval estimator.

Classes

AxisType(key, fullname, abbreviation)

Supported types for metrics that have binomial distributions.

class credible.curves.AxisType(key, fullname, abbreviation)[source]¶

Bases: Enum

Supported types for metrics that have binomial distributions.

This enumeration contains a list of metrics (rates) that have binomial distributions. For each available entry, we define the successes (k) and failures (l) such that the metric can be calculated as such:

\[metric = \frac{k}{k+l}\]

Keys with the same (integer) value represent synonyms.

Definitions are taken from https://en.wikipedia.org/wiki/Confusion_matrix.

Parameters:

key (int) –
One of the integer keys for supported measures:
- True positive rate, recall, sensitivity: 1
- True negative rate, specificity, selectivity: 2
- False negative rate: 3
- False positive rate: 4
- Precision, positive preditive value: 5
- Negative predictive value: 6
fullname (str) – Full name of the measure.
abbreviation (str) – The abbreviation for the measure (three-letter, lower-case).

TPR = (1, 'true positive rate', 'tpr')¶

REC = (1, 'recall', 'rec')¶

SEN = (1, 'sensitivity', 'sen')¶

TNR = (2, 'true negative rate', 'tnr')¶

SPEC = (2, 'specificity', 'spec')¶

SEL = (2, 'selectivity', 'sel')¶

FNR = (3, 'false negative rate', 'fnr')¶

FPR = (4, 'false positive rate', 'fpr')¶

PREC = (5, 'precision', 'prec')¶

PPV = (5, 'positive predictive value', 'ppv')¶

NPV = (6, 'negative predictive value', 'npv')¶

make_cm_functor(ci_functor)[source]¶

Return callable to treat binary confusion matrices and produce curve and confidence intervals.

This method will take a confidence interval functor and will wrap it over a complete confusion-matrix functor that produces the metric estimates, lower and upper confidence intervals, in this order.

Parameters:: ci_functor (Callable[[int, int], tuple[float, float, float]]) – Functor to evaluate the rate, lower and upper confidence intervals from (binomial) successes and failures.
Return type:: Callable[[GenericAlias[int64]], tuple[float, float, float]]
Returns:: A functor that can operate directly from the confusion matrix outputs (instead of success and failures), with the same return values as the input functor.

credible.curves.curve_ci(y_true, y_score, axes, ci_functor, skl_functor_name)[source]¶

Calculate points and confidence intervals of an arbitrary performance curve.

This function can calculate the rates and confidence intervals of an user-configurable ROC-style curve.

Parameters:

y_true (Iterable[int]) – Ground truth (correct) labels.
y_score (Iterable[float]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).
axes (tuple[AxisType, AxisType]) – Which axes to calculate the curve for. Note not all combinations make sense (no checks are performed). You should avoid computing a curve, for example, of TPR against FNR as these rates are complementary to 1.0.
ci_functor (Callable[[int, int], tuple[float, float, float]]) – A callable to be used to calculate the estimate, lower, and upper bounds for the measures of interest on each of the axes. This callable will receive (binomial, integer) successes and failures, and produce the rate, lower and upper bounds respecctively.
skl_functor_name (Literal['roc', 'det', 'pr']) –
The name of a callable from scikit-learn to calculate basic thresholds for the target curve. Use one of:
- roc: sklearn.metrics.roc_curve()
- det: sklearn.metrics.det_curve()
- pr: sklearn.metrics.precision_recall_curve()

Returns:

Seven 1-D floating point arrays corresponding to:

The selected metric for the first axis
The selected metric for the second axis
The thresholds used to evaluated the selected metrics, in decreasing order.
The lower confidence interval for the selected metric for the first axis
The lower confidence interval selected metric for the second axis
The upper confidence interval for the selected metric for the first axis
The upper confidence interval selected metric for the second axis

Return type:

tuple[GenericAlias[float64], GenericAlias[float64], GenericAlias[float64], GenericAlias[float64], GenericAlias[float64], GenericAlias[float64], GenericAlias[float64]]

credible.curves.curve_ci_hull(curve, extrapolate_from_origin=True)[source]¶

Calculate lower and upper confidence intervals of a curve.

This function calculates the hulls for 2 curves that are formed from points defining the lower and upper bounds of the curve’s credible/confidence intervals for each measured threshold.

It returns the curve (no changes), as well as the lower and upper bounds of the (central) curve.

To calculate the upper and lower curves, we do not consider the extremities of the upper and lower bounds, as those points would translate to pessimistic estimations of the true confidence interval bounds. Instead, we simply find the intersection of a straight line from the origin (0,0) and the ellipse 90-degree sector inscribed in the appropriate quarter of a rectangle centered at the ROC point, and its lower and upper bound CI estimates on both directions (horizontal and vertical). If extrapolate_from_origin is set to False, then intersections are created from (x, y) = (1, 0).

Parameters:

curve (tuple[Iterable[float], Iterable[float], Iterable[float], Iterable[float], Iterable[float], Iterable[float], Iterable[float]]) –
Seven 1-D floating point arrays corresponding to:
- The selected metric for the first axis
- The selected metric for the second axis
- The thresholds (in decreasing order, as produced by scikit-learn)
- The lower confidence interval for the selected metric for the first axis
- The lower confidence interval selected metric for the second axis
- The upper confidence interval for the selected metric for the first axis
- The upper confidence interval selected metric for the second axis
extrapolate_from_origin (bool) –
To calculate the upper hull, we consider two distinct cases: if extrapolate_from_origin is True (default), then we consider the curve starts (or finishes) at coordinate (x,y) = (1,0) and finishes (or starts) at (x,y) = (0,1). This is the case if the user is plotting TPR against TNR or FPR against FNR. If extrapolate_from_origin is False, then we consider the curve starts (or finishes) at (x,y) = (0,0), and finishes (or starts) at (x,y) = (1,1).

If extrapolate_from_origin is True (default), each point of the curve to extrapolates to the right and upper points defined by the upper bounds of the credible/confidence intervals, and to the left and lower points defined by the lower bounds of the intervals.

If extrapolate_from_origin is False, each point of the curve to extrapolates to the left and upper points defined by the upper bounds of the credible/confidence intervals, and to the right and lower points defined by the lower bounds of the intervals.

Returns:

Two curves as follows:

lower: Two 1D arrays of floats that expresses the lower-bound of curve, for the first and second coordinates respectively.
upper: Two 1D arrays of floats that expresses the upper-bound of curve, for the first and second coordinates respectively.

Return type:

tuple[tuple[GenericAlias[float64], GenericAlias[float64]], tuple[GenericAlias[float64], GenericAlias[float64]]]

credible.curves.area_under_the_curve(curve)[source]¶

Calculate the area under a curve using a trapezoidal rule.

Parameters:: curve (tuple[Union[Sequence[float], GenericAlias[float64]], Union[Sequence[float], GenericAlias[float64]]]) – A tuple with 2 1D sequences of floating point numbers representing the first and second coordinate of the curve whose you want to evaluate AUC for.
Return type:: float
Returns:: The area under the curve (floating point scalar).

credible.curves.average_metric(curve)[source]¶

Calculate the area under a curve using a rectangle rule.

Typically used to calculate the average precision (AP) as in: https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Average_precision

\[\begin{split}\text{AP} &= \sum_n (R_n - R_{n-1}) P_n \\ \text{AP} &= \sum_n (curve1[n] - curve1[n-1]) curve0[n]\end{split}\]

According to the scikit-learn documentation for sklearn.metrics.average_precision_score(), this implementation “is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic”.

Note

Due to differences in the way this package computes the precision-recall curve (does not add an extra (1.0, 0.0) at the end of the PR curve like sklearn does, see documentation for sklearn.metrics.precision_recall_curve()), we compensate this here.

Parameters:: curve (tuple[Union[Sequence[float], GenericAlias[float64]], Union[Sequence[float], GenericAlias[float64]]]) – A tuple with 2 1D sequences of floating point numbers representing the first and second coordinate of the curve whose you want to evaluate AUC for.
Return type:: float
Returns:: The area under the curve (floating point scalar).

credible.curves.estimated_ci_coverage(ci_functor, rng, n=100)[source]¶

Return the approximate coverage of a credible region or confidence interval estimator.

Reference: This blog post.

Parameters:

ci_functor (Callable[[Iterable[int], Iterable[int]], tuple[GenericAlias[float64], GenericAlias[float64], GenericAlias[float64]]]) – A callable that accepts k, the number of successes (1D integer numpy.ndarray), l (1D integer numpy.ndarray), the number of failures to account for in the estimation of the interval/region. This function must return two float parameters only corresponding to the lower and upper bounds of the credible region or confidence interval being estimated.
rng (Generator) – An initialized numpy random number generator.
n (int) – The number of bernoulli trials to consider on the binomial distribution. This represents the total number of samples you’d have for your experiment.

Return type:

GenericAlias[float64]

Returns:

The actual coverage curve, you can expect. The first row corresponds to the values of p that were probed. The second row, the actual coverage considering a simulated binomial distribution with size n.