Hi Josh/all, I have upgraded the openmpi to v 1.4 but still get the same error when I try executing the application on multiple nodes:
******************* Error: expected_component: PID information unavailable! Error: expected_component: Component Name information unavailable! ******************* I am running my application from the node 'portal11' as follows: mpirun -am ft-enable-cr -np 2 --hostfile hosts myapp. The file 'hosts' contains two host names: portal10, portal11. I am triggering the checkpoint using ompi-checkpoint -v 'PID' from portal11. I configured open mpi as follows: ##################### ./configure --prefix=/home/jean/openmpi/ --enable-picky --enable-debug --enable-mpi-profile --enable-mpi-cxx --enable-pretty-print-stacktrace --enable-binaries --enable-trace --enable-static=yes --enable-debug --with-devel-headers=1 --with-mpi-param-check=always --with-ft=cr --enable-ft-thread --with-blcr=/usr/local/blcr/ --with-blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes ######################### Question: what do you think can be wrong? Please instruct me on how to resolve this problem. Thank you Jean --- On Mon, 11/1/10, Josh Hursey <jjhur...@open-mpi.org> wrote: From: Josh Hursey <jjhur...@open-mpi.org> Subject: Re: [OMPI users] checkpointing multi node and multi process applications To: "Open MPI Users" <us...@open-mpi.org> List-Post: users@lists.open-mpi.org Date: Monday, 11 January, 2010, 21:42 On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote: > Hi Everyone, > I am trying to checkpoint an mpi application running >on multiple nodes. However, I get some error messages when i trigger the >checkpointing process. > > Error: expected_component: PID information unavailable! > Error: expected_component: Component Name information unavailable! > > I am using open mpi 1.3 and blcr 0.8.1 Can you try the v1.4 release and see if the problem persists? > > I execute my application as follows: > > mpirun -am ft-enable-cr -np 3 --hostfile hosts gol. > > My question: > > Does openmpi with blcr support checkpointing of multi node execution of mpi > application? If so, can you provide me with some information on how to > achieve this. Open MPI is able to checkpoint a multi-node application (that's what it was designed to do). There are some examples at the link below: http://www.osl.iu.edu/research/ft/ompi-cr/examples.php -- Josh > > Cheers, > > Jean. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users