Tuesday, January 16, 2018

Research Note #11 - Running WRF-CHEM on Oakforest-PACS

After so many compiling issues with WRF-CHEM installation on FX-10 since the beginning of this year, I started to change my attention to Oakforest-PACS (O-PACS) super computer. Just reading the specs, O-PACS definitely looks much more promising than FX-10: newer processors, (much) more cores and threads, more memory, twice as many nodes as FX-10 and the best of all; intel compilers! While not getting used as GCC, I had few experiences with intel compilers, especially ifort, and they're definitely way better than Fujitsu original compilers in term of compatibility with UNIX/Linux software development. 

Another unexpected thing which convinced me to switch to O-PACS was an email I received last friday. It said (in Japanese) that FX-10 service will be stopped after March 2018. Well, that's just like pouring salt on my food. I've done with FX-10. 

And that's it. In the early morning, I started using O-PACS and installing WRF-CHEM, and if it succeed, I would try to run it using parallel jobs with the compute node of the super computer. Guess what? Not only succeed installing the model, I also successfully run it with MPI and multi-nodes, and those were happened in just one day. While the installation itself was not perfectly smooth due to library linker issue and run-time stack problems, the model could run well afterwards. I will post the installation details soon.

There are several things I would like to highlight for running WRF-CHEM on compute node of O-PACS:
  • Number of processes per node affects the model processing time more than the number of nodes used. Using fewer nodes with more processes could significantly decrease the time consumption than using more nodes with fewer processes.
  • Number of nodes affects the consumption of token much. Using more nodes with fewer process was way more expensive than using fewer nodes with more processes.

Just take a look at the picture above. Job 1595130 (the third from bottom) was submitted to use 64 nodes and 8 processes per node, while job 1595143 (the bottom-most) was submitted to use (only) 16 nodes but has 32 processes per node. The total processes between the two jobs were same: 512 (64x8 and 16x32), the elapse times were almost same as well: around 10 minutes. How about the token consumed? Were they similar? Well ... not so much.

It's obvious that job 1595130 (more nodes, fewer processes) consumed token 4x more than job 1595143 (fewer nodes, more processes). Or maybe that's because I used different resource group?Well, we'll find out soon ...

Nevertheless, at least, starting from today, I can run the model using the full capabilities of one of the fastest super computers in the world. Yay !!!

Finally, I can concentrate more on the upcoming exam on Feb 7.

Time to sleep for more works tomorrow ... 

No comments:

Post a Comment