What is a good forecast?
by Tom Pagano (BoM)
My interest in forecasts began in 1997 at the . I was studying how well weather models reproduced the when a nearly unprecedented El Niño developed and captured everyone’s attention. It brought immediacy and focus to our work – about the possibility of floods; water was being released from dams, channels being cleared of debris, sandbags being laid. I was intrigued by the idea that nature would be giving scientists a closed-book exam; forecasts were hypotheses being tested.
A popular cartoon during the 1997-98 El Niño.
Around this time, , a seminal researcher in the evaluation and use of weather forecasts, passed away. One of the Murphy’s most influential essays was “?” He distinguished three types of ‘goodness’ (paraphrased in ):
- – the degree to which the forecast corresponds with the forecaster’s best judgment about the situation, based upon his/her knowledge base
- – the degree to which the forecast corresponds to what actually happened
- – the degree to which the forecast helps a decision maker to realize some incremental economic and/or other benefit
Originally I was unsure when there would be a situation in which a forecaster’s beliefs differed from the official products. But years later, when I became an operational forecaster, I fielded questions from users along the lines of “Yes, the forecast is X, but what do you think is really going to happen?” Consistency is a great topic, worthy of its own discussion.
Murphy further unpacked , listing attributes such as (lack of) andas the desirable features of a forecast (described further in this ). Allan Bradley and co-authors later put these Quality attributes in a .
However, anyone with a basic understanding of marketing would appreciate that the best designed or most effective products are not always embraced by consumers. Indeed used marketing research techniques to study how consumers defined the quality of data and information (using ‘quality’ in a broader sense than Murphy, encompassing more of a sense of ‘fitness for use’). They had professionals and business students create and then prioritize a list of 179 Quality attributes:
Word cloud of some of Wang and Strong’s 179 attributes of Quality
Others have created guidelines on measuring the goodness of forecasting services such as the suggesting surveying user perceptions of as well as and ). Sometimes services are evaluated during external audits, such as the or the ).
- Produced in a cost-effective and efficient manner
- Forecasts are reproducible
- Created following professional Standard Operating Procedures whose documentation is available to the user
- Production is operationally resilient (e.g. produced at the same time every day without fail)
- Honest, impartial and unprejudiced
- Created and delivered by professional and responsive staff
- Consistent with other sources or justifies why it is not
- Low false alarm rate and high probability of detection
- Relatively free from unconditional and conditional biases
- Probabilistically reliable with an appropriate spread (narrow but not too narrow)
- Verifiable (provide a time, location, and magnitude, not just one or two out of three)
- Unambiguous and free of contradictions
- Timely, in that it reflects the latest available information (is not stale) and arrives with enough leadtime for user to act
- Available from a consistent source with a consistent and accessible format
- Available with reliable and resilient access (e.g. accessible when power is out)
- Forecasts maintain their message despite re-reporting through various sources such as radio, TV
- Clear and easy to understand
- Complete yet brief and to the point
- Communicates confidence/uncertainty clearly
- Consistent message content (if different from last forecast, provide justification)
- Conveys something that people can visualize (i.e. physical realism)
- Meaningful units/Expressed in the user’s terms
- Has personal meaning for those at risk
- Relevant and specific to user vulnerabilities (e.g. locations, flood thresholds)
- Provides options for action
Do you feel that some under-appreciated attribute of forecast goodness requires more attention by HEPEX and the broader research community? Are there any aspects of goodness that are missing from the lists above? I welcome your feedback and discussion in the comments section below.