[OMPI users] Checkpointing an MPI application with OMPI

Maxime Boissonneault Mon, 28 Jan 2013 10:47:14 -0500

Hello,

I am doing checkpointing tests (with BLCR) with an MPI applicationcompiled with OpenMPI 1.6.3, and I am seeing behaviors that are quitestrange.


First, some details about the tests :

- The only filesystem available on the nodes are 1) one tmpfs, 2) onelustre shared filesystem (tested to be able to provide ~15GB/s forwriting and support ~40k IOPs).- The job was running with 8 or 16 MPI ranks on nodes with 8 cores (1 or2 nodes). Each MPI rank was using approximately 200MB of memory.- I was doing checkpoints with ompi-checkpoint and restarting withompi-restart.

- I was starting with mpirun -am ft-enable-cr

- The nodes are monitored by ganglia, which allows me to see the numberof IOPs and the read/write speed on the filesystem.


I tried a few different mca settings, but I consistently observed that :
- The checkpoints lasted ~4-5 minutes each time

- During checkpoint, each node (8 ranks) was doing ~500 IOPs, andwriting at ~15MB/s.

I am worried by the number of IOPs and the very slow writing speed. Thiswas a very small test. We have jobs running with 128 or 256 MPI ranks,each using 1-2 GB of ram per rank. With such jobs, the overall number ofIOPs would reach tens of thousands and would completely overload ourlustre filesystem. Moreover, with 15MB/s per node, the checkpointingprocess would take hours.


How can I improve on that ? Is there an MCA setting that I am missing ?

Thanks,

--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval
Ph. D. en physique

[OMPI users] Checkpointing an MPI application with OMPI

Reply via email to