credible.bayesian.metrics

Implementation of Scikit-Learn compatible measures with bayesian credible regions.

Module Attributes

NUMBER_MC_SAMPLES

Suggested number of samples to use for Monte Carlo simulations in this package.

Functions

accuracy_score(y_true, y_pred[, lambda_, ...])

Accuracy binary classification score.

average_precision_score(y_true, y_score[, ...])

Compute average precision (AP) from prediction scores.

det_curve(y_true, y_score[, lambda_, coverage])

Compute the Detection Error-Tradeoff (DET) curve.

f1_score(y_true, y_pred, rng[, lambda_, ...])

Return the mean, mode, upper and lower bounds of the credible region of the F1 score.

jaccard_score(y_true, y_pred[, lambda_, ...])

Jaccard binary classification score.

precision_recall_curve(y_true, y_score[, ...])

Compute Precision-Recall (PR) curve.

precision_score(y_true, y_pred[, lambda_, ...])

Precision binary classification score.

recall_score(y_true, y_pred[, lambda_, coverage])

Recall binary classification score.

roc_auc_score(y_true, y_score[, lambda_, ...])

Calculate the area under the ROC (FPR vs TPR) curve.

roc_curve(y_true, y_score[, lambda_, coverage])

Compute Receiver operating characteristic (ROC).

specificity_score(y_true, y_pred[, lambda_, ...])

Specificity binary classification score.

credible.bayesian.metrics.NUMBER_MC_SAMPLES = 100000

Suggested number of samples to use for Monte Carlo simulations in this package.

credible.bayesian.metrics.precision_score(y_true, y_pred, lambda_=1.0, coverage=0.95)[source]

Precision binary classification score.

AKA positive predictive value (PPV), mean, mode and credible intervals. It corresponds arithmetically to tp/(tp+fp). This function only supports binary classification problems.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_pred (Iterable[int]) – Predicted labels, as returned by a classifier.

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior. Changes in this value do not significantly affect the outcome, unless tp or fp are very small (close to 1).

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns:

Tuple with 4 floating-point numbers:

  • The actual precision, as would be returned by scikit-learn

  • The mode of the posterior distribution: It is typically close to the value estimated by scikit-learn.

  • The lower value of the credible region/confidence interval

  • The upper value of the credible region/confidence interval

Return type:

tuple[float, float, float, float]

credible.bayesian.metrics.recall_score(y_true, y_pred, lambda_=1.0, coverage=0.95)[source]

Recall binary classification score.

AKA sensitivity, hit rate, or true positive rate (TPR), mean, mode and credible intervals. It corresponds arithmetically to tp/(tp+fn).

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_pred (Iterable[int]) – Predicted labels, as returned by a classifier.

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior. Changes in this value do not significantly affect the outcome, unless tp or fp are very small (close to 1).

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns:

Tuple with 4 floating-point numbers:

  • The actual recall, as would be returned by scikit-learn

  • The mode of the posterior distribution: this represents the best estimate of the recall a posteriori. It is typically close to the value estimated by scikit-learn.

  • The lower value of the credible region/confidence interval

  • The upper value of the credible region/confidence interval

Return type:

tuple[float, float, float, float]

credible.bayesian.metrics.specificity_score(y_true, y_pred, lambda_=1.0, coverage=0.95)[source]

Specificity binary classification score.

AKA selectivity or true negative rate (TNR), mean, mode and credible intervals. It corresponds arithmetically to tn/(tn+fp).

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_pred (Iterable[int]) – Predicted labels, as returned by a classifier.

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior. Changes in this value do not significantly affect the outcome, unless tp or fp are very small (close to 1).

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns:

Tuple with 4 floating-point numbers:

  • The actual specificity, as would be returned by scikit-learn

  • The mode of the posterior distribution: this represents the best estimate of the specificity a posteriori. It is typically close to the value estimated by scikit-learn.

  • The lower value of the credible region/confidence interval

  • The upper value of the credible region/confidence interval

Return type:

tuple[float, float, float, float]

credible.bayesian.metrics.accuracy_score(y_true, y_pred, lambda_=1.0, coverage=0.95)[source]

Accuracy binary classification score.

See Accuracy. is the proportion of correct predictions (both true positives and true negatives) among the total number of pixels examined. It corresponds arithmetically to (tp+tn)/(tp+tn+fp+fn). This measure includes both true-negatives and positives in the numerator, what makes it sensitive to data or regions without annotations. AKA selectivity or true negative rate (TNR), mean, mode and credible intervals. It corresponds arithmetically to tn/(tn+fp).

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_pred (Iterable[int]) – Predicted labels, as returned by a classifier.

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior. Changes in this value do not significantly affect the outcome, unless tp or fp are very small (close to 1).

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns:

Tuple with 4 floating-point numbers:

  • The actual accuracy, as would be returned by scikit-learn

  • The mode of the posterior distribution: this represents the best estimate of the accuracy a posteriori. It is typically close to the value estimated by scikit-learn.

  • The lower value of the credible region/confidence interval

  • The upper value of the credible region/confidence interval

Return type:

tuple[float, float, float, float]

credible.bayesian.metrics.jaccard_score(y_true, y_pred, lambda_=1.0, coverage=0.95)[source]

Jaccard binary classification score.

See Jaccard Index or Similarity. It corresponds arithmetically to tp/(tp+fp+fn). The Jaccard index depends on a TP-only numerator, similarly to the F1 score. For regions where there are no annotations, the Jaccard index will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_pred (Iterable[int]) – Predicted labels, as returned by a classifier.

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior. Changes in this value do not significantly affect the outcome, unless tp or fp are very small (close to 1).

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you’re expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Returns:

Tuple with 4 floating-point numbers:

  • The actual jaccard score, as would be returned by scikit-learn

  • The mode of the posterior distribution: this represents the best estimate of the jaccard score a posteriori. It is typically close to the value estimated by scikit-learn.

  • The lower value of the credible region/confidence interval

  • The upper value of the credible region/confidence interval

Return type:

tuple[float, float, float, float]

credible.bayesian.metrics.f1_score(y_true, y_pred, rng, lambda_=1.0, coverage=0.95, nb_samples=100000)[source]

Return the mean, mode, upper and lower bounds of the credible region of the F1 score.

See F1-score. It corresponds arithmetically to 2*P*R/(P+R) or 2*tp/(2*tp+fp+fn). The F1 or Dice score depends on a TP-only numerator, similarly to the Jaccard index. For regions where there are no annotations, the F1-score will always be zero, irrespective of the model output. Accuracy may be a better proxy if one needs to consider the true abscence of annotations in a region as part of the measure.

This implementation is based on [GOUTTE-2005].

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_pred (Iterable[int]) – Predicted labels, as returned by a classifier.

  • rng (Generator) – An initialized numpy random number generator.

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you are expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

  • nb_samples (int) – Number of generated variates for the M-C simulation.

Returns:

Tuple with 4 floating-point numbers:

  • The actual F1 score, as would be returned by scikit-learn

  • The mode of the posterior distribution: this represents the best estimate of the F1 score a posteriori. It is typically close to the value estimated by scikit-learn.

  • The lower value of the credible region/confidence interval

  • The upper value of the credible region/confidence interval

Return type:

tuple[float, float, float, float]

credible.bayesian.metrics.roc_curve(y_true, y_score, lambda_=1.0, coverage=0.95)[source]

Compute Receiver operating characteristic (ROC).

Approximately follows API of sklearn.metrics.roc_curve().

Important

The returned credible regions are not immediately usable for plots or the evaluation of the area under the curve, only as point estimates for individual thresholds. To plot, feed the output of this funtion to curves.curve_ci_hull() and use the lower and upper estimates provided by that function instead.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_score (Iterable[float]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you are expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Return type:

tuple[ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]]

Returns:

Seven 1-D floating point arrays corresponding to:

  • FPR (false positive rates)

  • TPR (true positive rates)

  • The thresholds used to evaluated the selected metrics

  • The lower confidence interval for the FPR

  • The lower confidence interval for the TPR

  • The upper confidence interval for the FPR

  • The upper confidence interval for the TPR

credible.bayesian.metrics.roc_auc_score(y_true, y_score, lambda_=1.0, coverage=0.95)[source]

Calculate the area under the ROC (FPR vs TPR) curve.

This function mimics the scikit-learn API, except it also returns lower and upper bounds considering the credible regions defined in each threshold.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_score (Iterable[float]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you are expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Return type:

tuple[float, float, float]

Returns:

A tuple with 3 floats:

  • the area under the ROC (FPR vs. TPR) curve

  • the lower bound considering the credible region defined by lambda_ and coverage parameters.

  • the upper bound considering the credible region defined by lambda_ and coverage parameters.

credible.bayesian.metrics.det_curve(y_true, y_score, lambda_=1.0, coverage=0.95)[source]

Compute the Detection Error-Tradeoff (DET) curve.

Approximately follows API of sklearn.metrics.det_curve().

Important

The returned credible regions are not immediately usable for plots or the evaluation of the area under the curve, only as point estimates for individual thresholds. To plot, feed the output of this funtion to curves.curve_ci_hull() and use the lower and upper estimates provided by that function instead.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_score (Iterable[float]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you are expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Return type:

tuple[ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]]

Returns:

Seven 1-D floating point arrays corresponding to:

  • FPR (false positive rates)

  • FNR (false negative rates)

  • The thresholds used to evaluated the selected metrics

  • The lower confidence interval for the FPR

  • The lower confidence interval for the FNR

  • The upper confidence interval for the FPR

  • The upper confidence interval for the FNR

credible.bayesian.metrics.precision_recall_curve(y_true, y_score, lambda_=1.0, coverage=0.95)[source]

Compute Precision-Recall (PR) curve.

Approximately follows API of sklearn.metrics.precision_recall_curve().

Note

This package computes the precision-recall curve in a similar, but slightly different way than scikit-learn. It does not add an extra (1.0, 0.0) at the end of the PR curve. (c.f.: documentation for sklearn.metrics.precision_recall_curve()).

Important

The returned credible regions are not immediately usable for plots or the evaluation of the area under the curve, only as point estimates for individual thresholds. To plot, feed the output of this funtion to curves.curve_ci_hull() and use the lower and upper estimates provided by that function instead.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_score (Iterable[float]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you are expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Return type:

tuple[ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]]

Returns:

Seven 1-D floating point arrays corresponding to:

  • Precision

  • Recall

  • The thresholds used to evaluated the selected metrics

  • The lower confidence interval for the Precision

  • The lower confidence interval for the Recall

  • The upper confidence interval for the Precision

  • The upper confidence interval for the Recall

credible.bayesian.metrics.average_precision_score(y_true, y_score, lambda_=1.0, coverage=0.95)[source]

Compute average precision (AP) from prediction scores.

This function mimics the scikit-learn API, except it also returns lower and upper bounds considering the credible regions defined in each threshold.

Parameters:
  • y_true (Iterable[int]) – Ground truth (correct) labels.

  • y_score (Iterable[float]) – Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions (as returned by “decision_function” on some classifiers).

  • lambda – The parameterisation of the Beta prior to consider. Use \(\lambda=1\) for a flat prior. Use \(\lambda=0.5\) for Jeffrey’s prior.

  • coverage (float) – A floating-point number between 0 and 1.0 indicating the coverage you are expecting. A value of 0.95 will ensure 95% of the area under the probability density of the posterior is covered by the returned equal-tailed interval.

Return type:

tuple[float, float, float]

Returns:

A tuple with 3 floats:

  • the area under the ROC (FPR vs. TPR) curve

  • the lower bound considering the credible region defined by lambda_ and coverage parameters.

  • the upper bound considering the credible region defined by lambda_ and coverage parameters.