Hi, I am running WRF simulations on multiple nodes and am running into problems where the simulation will randomly slow down. The model still works, but slows down tremendously. I looked at the each node and found that 1 node will only be using 25% of the CPU, while the others are using 100%. Is there a chance that this is related to MPI? I can resubmit the same run on a different nodes and sometimes it will work, and other times it slows down.
Is there any commands I can utilize that could point me to what is causing the node only to use 25%? Thanks