Continuous Distributions and Measures of Statistical Accuracy for Structured Expert Judgment

This study evaluates five scoring rules for assessing uncertainty estimates, finding that the behavior of different scoring rules varies.

View Journal Article

Date

May 5, 2025

Publication

Journal Article in Futures & Foresight Science

Reading time

1 minute

Abstract

This study evaluates five scoring rules, or measures of statistical accuracy, for assessing uncertainty estimates from expert judgment studies and model forecasts. These rules — the Continuously Ranked Probability Score (CRPS), Kolmogorov-Smirnov (KS), Cramer-von Mises (CvM), Anderson Darling (AD), and chi-square test — were applied to 6864 expert uncertainty estimates from 49 Classical Model (CM) studies. We compared their sensitivity to various biases and their ability to serve as performance-based weight for expert estimates. Additionally, the piecewise uniform and Metalog distribution were evaluated for their representation of expert estimates because four of the five rules require interpolating the experts’ estimates. Simulating biased estimates reveals varying sensitivity of the considered test statistics to these biases. Expert weights derived using one measure of statistical accuracy were evaluated with other measures to assess their performance. The main conclusions are (1) CRPS overlooks important biases, while chi-square and AD behave similarly, as do KS and CvM. (2) All measures except CRPS agree that performance weighting is superior to equal weighting with respect to statistical accuracy. (3) Neither distributions can effectively predict the position of a removed quantile estimate. These insights show the behavior of different scoring rules for combining uncertainty estimates from expert or models, and extent the knowledge for best-practices.

Authors

Related Content