|
Special Issue on Expert Judgment
Guest Editorial: Special Issue on Expert Judgment Editors_Intro4EJ_special_issue, 31KB
Roger M. Cooke
Introduction: We are indeed gratified to be able to present eight impressive papers on the subject of expert judgment. It is well-known that the calls for taking uncertainty more seriously in quantitative decision support, at all levels, become ever more persistent. Quantifying uncertainty means, proximally and for the most part, using structured expert judgment. The qualifier "structured" means that expert judgment is treated as scientific data, albeit scientific data of a new type. Elicitation and representation of uncertainties, processing the expert judgment data, and utilization of results must be subjected to transparent methodological rules grounded in the scientific method itself. The first article announces the availability of the TU Delft expert judgment database to all researchers in this field. Three other articles in this volume illustrate the use of this data. The articles will be mentioned and briefly summarized in the order of their appearance.
TU Delft Expert Judgment Data Base | Cooke_Goossens.pdf, 487KB
Roger M. Cooke
Abstract: We review the applications of structured expert judgment uncertainty quantification using the "classical model" developed at the Delft University of Technology over the last 17 years (Cooke, 1991). These involve 45 expert panels, performed under contract with problem owners who reviewed and approved the results. With a few exceptions, all these applications involved the use of seed variables; that is, variables from the experts' area of expertise for which the true values are available post hoc. Seed variables are used to (1) measure expert performance, (2) enable performance based weighted combination of experts' distributions, and (3) evaluate and hopefully validate the resulting combination or "decision maker". This article reviews the classical model for structured expert judgment and the performance measures, reviews applications, comparing performance based decision makers with "equal weight" decision makers, and collects some lessons learned.
Expert Judgement Combination using Moment Methods Wisse_Bedford_Quigley.pdf, 213KB
Bram Wisse, Tim Bedford and John Quigley
Abstract:Moment methods have been employed in decision analysis, partly to avoid the computational burden that decision models involving continuous probability distributions can suffer from. In the Bayes linear (BL) methodology prior judgements about uncertain quantities are specified using expectation (rather than probability) as the fundamental notion. BL provides a strong foundation for moment methods, rooted in work of De Finetti and Goldstein. The main objective of this paper is to discuss in what way expert assessments of moments can be combined, in a non-Bayesian way, to construct a prior assessment. We show that the linear pool can be justified in an analogous but technically different way to linear pools for probability assessments, and that this linear pool has a very convenient property: a linear pool of experts' assessments of moments is coherent if each of the experts has given coherent assessments. To determine the weights of the linear pool we give a method of performance based weighting analogous to Cooke's classical model and explore its properties. Finally we compare its performance with the classical model on data gathered in applications of the classical model.
Abstract: This article looks at a new approach to expert elicitation that combines basic elements of conventional expert elicitation protocols with formal survey methods and larger, heterogeneous expert panels. This approach is appropriate where the hazard-estimation task requires a wide range of expertise and professional experience. The ability to judge when to rely on alternative data sources often is critical for successful risk management. We show how a large, heterogeneous sample can support internal validation of not only the experts' assessments but also prior information that is based on limited historical data. We illustrate the use of this new approach to expert elicitation by addressing a fundamental problem in U.S. food safety management, obtaining comparable food system-wide estimates of the foodborne illness by food-pathogen pair and by food. The only comprehensive basis for food-level hazard analysis throughout the U.S. food supply currently available is outbreak data (i.e., when two or more people become ill from the same food source), but there is good reason to question the portrayal that outbreak data alone gives of food risk. In this paper, we compare results of food and food-pathogen incidence estimates based on expert judgment and based on outbreak data, and we demonstrate a suite of uncertainty measures that allow for a fuller understanding of the results.
Oswaldo Morales, Dorota Kurowicka and A. Roelen
Abstract: Causes of uncertainties may be interrelated and may introduce dependencies. Ignoring these dependencies may lead to large errors. A number of graphical models in probability theory such as dependence trees, vines and (continuous) Bayesian Belief Nets ([1], [2], [3], [4], [5], [6]) have been developed to capture dependencies between random variables. The input for these models are various marginal distributions and dependence information, usually in the form of conditional rank correlations. Often expert elicitation is required. This paper focuses on dependence representation, and dependence elicitation. The techniques presented are illustrated with an application from aviation safety.
A Study of Expert Overconfidence | Lin_Bier.pdf, 246KB
Shi-Woei Lin and Vicki M. Bier
Abstract: Overconfidence is one of the most common (and potentially severe) problems in expert judgment. To assess the extent of expert overconfidence, we analyzed a large data set on expert opinion compiled by Cooke and colleagues at the Technical University of Delft and elsewhere. This data set contains roughly five thousand 90% confidence intervals of uncertain quantities for which the true values are now known. Our analysis assesses the overall extent of overconfidence in the data set. Significant differences in the extent of overconfidence were found among studies, among experts, and among questions within a study. Moreover, replications (multiple realizations for the same question) allowed a preliminary assessment of whether the question effect is due largely to question difficulty, or merely to random noise in the realizations of the uncertain quantities. The results of this analysis suggest that much of the apparent question effect may be due to noise rather than systematic differences in the difficulty of achieving good calibration for different questions. The results support the differential weighting of experts, since there are significant differences in expert calibration within studies.
A Paired Comparison Experiment for Gathering Expert Judgment for an Aircraft Wiring Risk Assessment | Mazzuchi_Linzey_Brunin.pdf, 129KB
Thomas A. Mazzuchi, William G. Linzey, and Armin Brunin
Abstract: Wire failure in aircraft can be attributed to several factors and the assessment of the risk of wire failure is becoming an increasingly important task. This paper will discuss the results of an actual experiment to use the paired-comparison technique for expert judgment to develop a relationship for the probability of wire failure as a function of influencing factors in an aircraft environment. The reasons for using this technique are two-fold. First, the failure probability depends on many variables including wire gauge, vibration, environmental condition etc. In addition, the wire failure data is sparse and fitting this data to a complex failure function is a nontrivial task that may involve a host of assumptions that may not be provable. We describe a method for using actual failure data and the results from a paired comparison to populate the model parameters. In the approach, paired comparison data from select environments is used to obtain failure rate estimates for the candidate environments. Next, a functional relationship for wire failure as a function of the environments is constructed using a proportional hazards model. A regression model is fit from the failure rate estimates to the environmental variables and is used as an estimate of the failure response surface. This technique is being investigated as a means to generate failure rates for an Electrical Wiring Interconnection System Risk Assessment software tool currently being developed for the FAA Tech Center.
Uncertainty in Mortality Response to Airborne Fine Particulate Matter: Elicitation of European Air Pollution Experts Tuomisto_Wilson_evans_Tainio.pdf, 283KB
Jouni T. Tuomisto, Andrew Wilson, John S. Evans, and Marko Tainio
Abstract: The authors have performed a structured expert judgment study of the population mortality effects of fine particulate matter (PM2.5) air pollution. The opinions of six European air pollution experts were elicited. The ability of each expert to probabilistically characterize uncertainty was evaluated using 12 calibration questions -- relevant variables whose true values were unknown at the time of elicitation, but available at the time of analysis. The elicited opinions exhibited both uncertainty and disagreement. It emerged that there were significant differences in expert performance. Two combinations of the experts' judgments were computed and evaluated -- one in which each expert's views received equal weight; the other in which the expert's judgments were weighted by their performance on the calibration variables. When the performance of these combinations was evaluated the equal-weight decision-maker exhibited acceptable performance, but was nonetheless inferior to the performance-based decision-maker.
In general, the experts agreed with published studies for the best estimate of all-cause mortality from PM2.5; however, as would be expected, they gave confidence intervals that were several times broader than the statistical confidence intervals taken directly from the most frequently cited published studies. The experts were rather comfortable with applying epidemiological results from one geographic region to another. However, there was more uncertainty and disagreement about issues of timing of the effect and about the relative toxicity of different constituents of PM2.5. Even so, the experts were in fairly good agreement that an appreciable fraction of the long-term health effects occurs within a few months from the exposure and that combustion-derived particles are more toxic than PM2.5 on average, while secondary sulphates, nitrates and/or crustal materials may be less toxic. These assessments bring very valuable and relevant information to air pollution risk assessment.
On the Performance of Social Network and Likelihood Based Expert Weighting Schemes | Cooke_ElSaadany_Huang.pdf, 286KB
Abstract: Using expert judgment data from the TU Delft's expert judgment data base, we compare the performance of different weighting schemes, namely equal weighting, performance based weighting from the classical model (Cooke, 1991), social network (SN) weighting and likelihood weighting. The picture that emerges with regard to social network weights is rather mixed. SN theory does not provide an alternative to performance based combination of expert judgments, since the statistical accuracy of the SN decision maker is sometimes unacceptably low. On the other hand, it does outperform equal weighting in the majority of cases. The results here, though not overwhelmingly positive, do nonetheless motivate further research into social interaction methods for nominating and weighting experts. Indeed, a full expert judgment study with performance measurement requires an investment in time and effort, with a view to securing external validation. If high confidence in a comparable level of validation can be obtained by less intensive methods, this would be very welcome, and would facilitate the application of structured expert judgment in situations where the resources for a full study are not available. Likelihood weights are just as resource intensive as performance based weights, and the evidence presented here suggests that they are inferior to performance based weights with regard to those scoring variables which are optimized in performance weights (calibration and information). Perhaps surprisingly, they are also inferior with regard to likelihood. Their use is further discouraged by the fact that they constitute a strongly improper scoring rule.
Comments from Professor Tony O'Hagan OHagen_cmnts_EJspecial_issue, 14KB
- "A Study of Expert Overconfidence" by Shi Woei-Lin and Vicki M. Bier
- "A Paired Comparison Experiment for Gathering Expert Judgment for an Aircraft Wiring Risk Assessment" by Mazzuchi, Linzey and Brunin
- "Eliciting Conditional and Unconditional..." by Morales, Kurowicka and Roelen
- "Expert Judgement Combination using Moment Methods" by Bram Wisse, Tim Bedford and John Quigley
Comments from Professor Robert T. Clemen Clemen_Cmnts_EJspecial_issue, 98KB
Several of the papers in this special issue are in one way or another linked to Cooke's "classical" method for combining expert probability distributions. This comment focuses on characteristics of that method. In particular, I consider two questions: Does the weighting scheme give the experts a positive incentive to report their beliefs honestly for each variable? How does Cooke?s method perform when evaluated out-of-sample?
Comments from Professor Simon French French_Cmmnts_EJspecial_issue.pdf, 38KB
- "Expert Judgement Combination using Moment Methods" by Bram Wisse, Tim Bedford and John Quigley
- "TU Delft Expert Judgment Data Base" by Roger M. Cooke and
Louis L.H.J. Goosen
- "A Study of Expert Overconfidence" by Shi Woei-Lin and Vicki M. Bier
- "On the Performance of Social Network and Likelihood Based Expert Weighting Scheme" by Roger M. Cooke, Sussie ElSaadany and Xinzheng Huang
Response by Bedford et. al. to Tony O'Hagan's and Simon French' comments | Response2discussants_TB.pdf, 16KB
Response by Morales et. al. to Tony O'Hagen comments Morales_Responce2cmnts.pdf, 14KB
Response by Lin and Bier to Tony O'Hagan's and Simon French' comments Bier_Response2cmnts.pdf, 16KB
Response by Mazzuchi et. al. to Tony O'Hagan's comments Mazzuchi_Response2cmnts.pdf, 22KB
Response by Roger Cooke to Simon French' and Bob Clemen's comments Cooke_resp2cmnts.pdf, 66KB |