It is worth highlighting that our benchmark dataset for data-driven weather forecasting, i.e. WeatherBench, is fully available online now. You can
- download all the data-sets via the TUM library,
- the sources (include a Jupiter notebook to get started) on GitHub,
- and check our the preprint of the corresponding paper on arXiv.
- In addition, Stephan has a nice overview on his blog.
Here’s the full abstract: Data-driven approaches, most prominently deep learning, have become powerful tools for prediction in many domains. A natural question to ask is whether data-driven methods could also be used for numerical weather prediction. First studies show promise but the lack of a common dataset and evaluation metrics make inter-comparison between studies difficult. Here we present a benchmark dataset for data-driven medium-range weather forecasting, a topic of high scientific interest for atmospheric and computer scientists alike. We provide data derived from the ERA5 archive that has been processed to facilitate the use in machine learning models. We propose a simple and clear evaluation metric which will enable a direct comparison between different methods. Further, we provide baseline scores from simple linear regression techniques, deep learning models as well as purely physical forecasting models. We hope that this dataset will accelerate research in data-driven weather forecasting.