Dr. Shuxia Zhang

Project Title: 
Leveraging HPC Techniques to Efficiently Develop Weather AI Models

In the past two years, several innovative weather AI models have been released. FourCastNet, Pangu, and GraphCast are the most famous ones. They can predict global weather changes seven to ten days ahead with the accuracy comparable to or even better than the skills of traditional numerical models. Although the AI-powered models have shown the advantage of being extremely fast in making the weather forecasts, their usefulness is limited to the scenario for which the models were trained. The most challenging task essential to the success is training the models using relevant weather data of tens terabytes through sophisticated algorithms. The model training has been a highly time-consuming process. For example, training FourCastNet took 16 hours of clock time on a cluster of 64 Nvidia A100 GPUs; Pangu - 300 GPU days; GraphCast - roughly four weeks on 32 Cloud TPU v4 devices. That does not include the huge amount of time needed for data engineering (retrieving, repacking, reformatting, reorganizing) before training the model.

This research focuses on leveraging well-established HPC techniques and practices to deal with the challenges encountered in weather AI research, from data engineering to model training, given a well-defined purpose. What computing resources are available (CPU/GPC processor, memory capacity, storage, file system, network)? What is an optimal parallel algorithm for the specific research purpose?  Where are the massively parallel processing badly needed and how to implement it? How to decompose the geographic grids and distribute them onto the available compute nodes? Asynchronous IO and non-blocking communication? Half or single precision + data normalization to reduce the memory footprint? Code and IO performance optimization?

The objective of this research is to establish and provide the community with some guidance and/or methodology for the purpose to quickly train AI-based weather models of various research interests.

Project Investigators

Jimmy Xiao
Dr. Shuxia Zhang
 
Are you a member of this group? Log in to see more information.