What is a good forecast?
by Tom Pagano (BoM)
My interest in forecasts began in 1997 at the University of Arizona. I was studying how well weather models reproduced the effects of El Niño in the southwest US when a nearly unprecedented El Niño developed and captured everyone’s attention. It brought immediacy and focus to our work – people wanted to know about the possibility of floods; water was being released from dams, channels being cleared of debris, sandbags being laid. I was intrigued by the idea that nature would be giving scientists a closed-book exam; forecasts were hypotheses being tested.
A popular cartoon during the 1997-98 El Niño.
Around this time, Allan Murphy, a seminal researcher in the evaluation and use of weather forecasts, passed away. One of the Murphy’s most influential essays was “What is a good forecast?” He distinguished three types of ‘goodness’ (paraphrased in Beth Ebert’s verification FAQ):
- Consistency – the degree to which the forecast corresponds with the forecaster’s best judgment about the situation, based upon his/her knowledge base
- Quality – the degree to which the forecast corresponds to what actually happened
- Value – the degree to which the forecast helps a decision maker to realize some incremental economic and/or other benefit
Consistency? Originally I was unsure when there would be a situation in which a forecaster’s beliefs differed from the official products. But years later, when I became an operational forecaster, I fielded questions from users along the lines of “Yes, the forecast is X, but what do you think is really going to happen?” Consistency is a great topic, worthy of its own discussion.
Murphy further unpacked Quality, listing attributes such as Accuracy, (lack of) Bias, Reliability, and Resolution as the desirable features of a forecast (described further in this HEPEX post on verification). Allan Bradley and co-authors later put these Quality attributes in a comprehensive framework for verification of ensemble streamflow forecasts.
The attributes of Quality are necessary but by no means sufficient for good forecasts. Murphy himself said (quoted in a recent HEPEX blog) “… forecasts possess no intrinsic value. They acquire value through their ability to inﬂuence the decisions made by users of the forecasts”.
However, anyone with a basic understanding of marketing would appreciate that the best designed or most effective products are not always embraced by consumers. Indeed Wang and Strong (1996) used marketing research techniques to study how consumers defined the quality of data and information (using ‘quality’ in a broader sense than Murphy, encompassing more of a sense of ‘fitness for use’). They had professionals and business students create and then prioritize a list of 179 Quality attributes:
Word cloud of some of Wang and Strong’s 179 attributes of Quality
These were then grouped and prioritized into a subset of categories, such as Accuracy, Relevancy, Interpretability and Accessibility. A surprising result of their study was the importance of aspects such as Believability, Objectivity, and Reputation. If the goal is to have the customer use the information successfully, the customer must first believe that the information is trustworthy. The importance of credibility is illustrated in the emphasis that flood forecasting agencies place on preserving the reputation of their forecasts by, for example, forecasting correctly during fair-weather conditions and avoiding waffles (i.e. inconsistencies, as described in Florian Pappenberger’s article on the topic).
Others have created guidelines on measuring the goodness of forecasting services such as theWorld Meteorological Organization suggesting surveying user perceptions ofAccuracy, Timeliness, Ease of Use, Accessibility, Added Value as well as Staff Responsiveness and Professionalism). Sometimes services are evaluated during external audits, such as the 1999 audit of the Australian Bureau of Meteorology or the Queensland Chief Scientist’s examination of flood warnings).
In these reports, along with surveys of the Information Quality research literature, five common themes on what makes a good forecast emerge.
Production (How the forecasts are created)
- Produced in a cost-effective and efficient manner
- Forecasts are reproducible
- Created following professional Standard Operating Procedures whose documentation is available to the user
- Production is operationally resilient (e.g. produced at the same time every day without fail)
Credibility (How the forecasts are perceived)
- Honest, impartial and unprejudiced
- Created and delivered by professional and responsive staff
- Consistent with other sources or justifies why it is not
Accuracy (How good the forecasts are in a technical sense)
- Low false alarm rate and high probability of detection
- Relatively free from unconditional and conditional biases
- Probabilistically reliable with an appropriate spread (narrow but not too narrow)
- Verifiable (provide a time, location, and magnitude, not just one or two out of three)
- Unambiguous and free of contradictions
Transmission (How the forecasts get to users)
- Timely, in that it reflects the latest available information (is not stale) and arrives with enough leadtime for user to act
- Available from a consistent source with a consistent and accessible format
- Available with reliable and resilient access (e.g. accessible when power is out)
- Forecasts maintain their message despite re-reporting through various sources such as radio, TV
Messaging (How the forecasts are framed for the user)
- Clear and easy to understand
- Complete yet brief and to the point
- Communicates confidence/uncertainty clearly
- Consistent message content (if different from last forecast, provide justification)
- Conveys something that people can visualize (i.e. physical realism)
- Meaningful units/Expressed in the user’s terms
- Has personal meaning for those at risk
- Relevant and specific to user vulnerabilities (e.g. locations, flood thresholds)
- Provides options for action
Clearly scientists can contribute much more to the goodness of forecasts than just the technical aspects of accuracy. For example, Australian social scientists helped craft guides on how to word effective emergency warnings.
Do you feel that some under-appreciated attribute of forecast goodness requires more attention by HEPEX and the broader research community? Are there any aspects of goodness that are missing from the lists above? I welcome your feedback and discussion in the comments section below.
Reproduced from the HEPEX web site, where Tom Pagano is currently guest columnist. Join the discussion around Tom’s article here.