At Excarta, our mission is to help make businesses resilient with our state-of-the-art weather forecasting products.

Thorough evaluations are a key part of our research and development process, giving us the necessary feedback for future model improvements. We evaluate our models using the same procedures and metrics used by premier weather forecasting agencies like NOAA (National Oceanic and Atmospheric Administration) [1] and ECMWF (European Centre for Medium-Range Weather Forecasts) [2].

Headline scores

Compared to the IFS forecast, one of the best conventional weather models, Excarta's forecasts are more accurate for key weather variables over a 48 hour forecast:

  • Precipitation: 13% lower error (averaged over 48 hours)
  • Wind speed: 7.5% lower error (averaged over 48 hours)
  • Dewpoint: 9% lower error (averaged over 48 hours)
  • Temperature: 4.7% lower error (averaged over 48 hours)

The improved performance of our models make them a valuable tool for weather-sensitive tasks like demand and power prediction for renewables, precision agriculture, industrial optimization, etc.

If you're interested in learning more or speaking with us, please schedule a chat or email us at contact@excarta.io.

Detailed results

For a thorough evaluation, we compare our forecasts over the entire year of 2020 as an evaluation period. Using a full year guarantees that the model is tested against all seasons globally (a more challenging problem), and hence does not get any unfair advantages.

For details about our methodology, see the Measuring Forecast Quality section.

6-hour precipitation (mm)

Compared to IFS, Excarta's models show a lower RMSE and higher ACC for up to 48 hours of lead time.

Temperature at 2m height

Compared to IFS, Excarta's models show a lower RMSE and higher ACC for up to 36 hours of lead time.

Dewpoint at 2m height

Compared to IFS, Excarta's models show a lower RMSE and higher ACC for up to 48 hours of lead time.

Eastward wind speed at 10m height

Compared to IFS, Excarta's models show a lower RMSE and higher ACC for up to 36 hours of lead time.

Northward wind speed at 10m height

Compared to IFS, Excarta's models show a lower RMSE and higher ACC for up to 36 hours of lead time.

Measuring forecast quality

No weather forecast is perfect, making it critical to rigorously compare forecasts against what actually happened in the real world. As an example, for a weekly forecast issued at midnight on Monday, how close were the predicted and observed temperatures for midnight Tuesday?

To see how good our weather forecast is, we need to go through the following steps:

  1. Get observations of what actually happened
  2. Identify which variables (e.g., temperature) should be assessed
  3. Use relevant metrics to measure accuracy
  4. Compare ourselves against other leading weather models

Getting observation data: what actually happened

Ground truth data tells us what actually happened in the real world. The ground truth is collected from multiple sources which measure different things, like observations from weather stations, weather balloons, satellite imagery, radar data, etc.

After rigorous quality checks, observations from all these sources are combined into a single, comprehensive snapshot of the atmosphere, called an analysis. As fresh observations are received, the analysis is updated to reflect how the atmosphere has evolved — giving us an hour-by-hour account of the observed state of the atmosphere.

Government weather forecasting agencies, NOAA and ECMWF, regularly compute and share these analyses as produced by their data processing pipelines. In our evaluations, we use an analysis product provided by ECMWF called ERA5 [3].

By using the analysis product provided by ECMWF, we ensure that we are testing ourselves against a high-quality source of data from an independent agency — this also ensures we do not cherry-pick the data to make us look good!
Variables of interest

While Excarta's deeptech weather products are capable of predicting many different variables, here we focus on a few key variables:

  • 6-hourly total precipitation, relevant for applications in agriculture, retail, etc.
  • Temperature at 2m height above ground, relevant in modeling heat waves, power demand, etc.
  • Dewpoint at 2m height above ground, relevant in modeling visibility, wet bulb temperature, etc.
  • Eastward wind speed at 10m height above ground, a proxy for wind speeds at windmill height
  • Northward wind speed at 10m height above ground, a proxy for wind speeds at windmill height
These variables are chosen because they have the most direct impact on operations across many industries including renewables, transportation, and agriculture.

Error metrics

Having specified where our ground truth comes from, and which variables we care about, we can now calculate a few quality metrics for our forecast. For consistency, we choose the same metrics as prescribed by premier government forecasting agencies:

  • RMSE: Briefly, Root Mean Squared Error (RMSE) measures how far the predicted value (e.g., for temperature) is from the true value.
    An RMSE of 0.0 indicates the prediction was perfect.
  • ACC: Briefly, Anomaly Correlation Coefficient (ACC) measures how well the forecast correlates with observations, after accounting for local climate. This prevents a model from scoring well by just always predicting the local "average" weather.
    An ACC of 1.0 indicates the forecast perfectly predicts anomalous weather.
Benchmarks

We use the Integrated Forecasting System (IFS) weather product from ECMWF as a benchmark. ECMWF is one of the world's leading weather forecasting and research organizations, and IFS is regarded as one of the best conventional weather models available. This makes the IFS weather forecast an ideal benchmark for Excarta's AI-powered weather models.

We evaluate forecasts for 48 hours of "lead time", i.e., comparing predictions and observations 48 hours after a forecast is issued. We focus on 48 hours of lead time as that is the horizon where most businesses make operational decisions.

References