At Excarta, our mission is to help make businesses resilient with our state-of-the-art weather forecasting products.
Thorough evaluations are a key part of our research and development process, giving us the necessary feedback for future model improvements. We evaluate our models using the same procedures and metrics used by premier weather forecasting agencies like NOAA (National Oceanic and Atmospheric Administration)1, ECMWF (European Centre for Medium-Range Weather Forecasts)2, as well as by leading research groups like Google Research3.
Compared to the IFS (“Euro”) model, Excarta’s models have up to 10% lower error for temperature, dewpoint, and wind speed forecasts.
Operational forecast quality
Excarta’s AI weather models are trained on decades of historical data. In addition to evaluating on historical data, we continually evaluate the “live” forecasts produced by our operational AI weather models. Here, we report on the quality of our operational AI models: in other words, this is the quality of the weather forecasts you can access via our APIs.
Excarta’s core weather model produces forecasts with up to 5% lower error than the IFS forecast. To further improve our forecast quality we run the weather model several times with different starting conditions, producing an “ensemble” forecast for the next 14 days. The ensemble forecast has errors to up to 10% lower than IFS, and also provides an estimate of the forecast uncertainty as well. Our forecast APIs serve the ensemble forecast, giving users better accuracy than what’s available from other purely physics-based models.
How is forecast quality measured?
At a very high level, weather forecasts are evaluated using a straightforward procedure:
- Produce a weather forecast starting from a specific time and duration, say, a 1 week forecast starting midnight on Monday.
- Get ground truth data for what the weather actually was for that week.
- Compare the forecast against the ground truth using well-established metrics.
Getting the ground truth
Ground truth data tells us what actually happened in the real world. This data is collected from multiple sources which measure different things, like observations from weather stations, weather balloons, satellite imagery, radar data, etc.
After rigorous quality checks, observations from all these sources are combined into a single, comprehensive snapshot of the atmosphere, called an analysis. As fresh observations are received, the analysis is updated to reflect how the atmosphere has evolved — giving us an hour-by-hour account of the observed state of the atmosphere.
Government weather forecasting agencies, NOAA and ECMWF, regularly compute and share these analyses as produced by their data processing pipelines. In our evaluations, we use an analysis product provided by ECMWF called ERA54.
By using the reanalysis product provided by ECMWF, we ensure that we are testing ourselves against a high-quality source of data from an independent agency — this also ensures we do not cherry-pick the data to make us look good!
Choosing a baseline
To validate the quality of our forecasts, we compare ourselves against the IFS forecast from ECMWF, commonly known as the “Euro” model. In particular, we measure the quality of our forecasts for temperature, dewpoint, and wind speeds. These variables have an outsized impact on essential industries like energy and agriculture, and serve as a proxy for the overall forecast quality.
Choosing quality metrics
We use the Root Mean Squared Error (RMSE) error metric to measure forecast quality. No single metric can suitably capture all aspects of the quality of a weather forecast, but the RMSE metric reflects how close the forecast is to the actual weather and is suitable for many applications. We plan to continually add more metrics reflecting different aspects of forecast quality.