Hi Josh/all,

I have upgraded the openmpi to v 1.4  but still get the same error when I try 
executing the application on multiple nodes:

*******************
 Error: expected_component: PID information unavailable!
 Error: expected_component: Component Name information unavailable!
*******************

I am running my application from the node 'portal11' as follows:

mpirun -am ft-enable-cr -np 2 --hostfile hosts  myapp.

The file 'hosts' contains two host names: portal10, portal11.

I am triggering the checkpoint using ompi-checkpoint -v 'PID' from portal11.


I configured open mpi as follows:

#####################

./configure --prefix=/home/jean/openmpi/ --enable-picky --enable-debug 
--enable-mpi-profile --enable-mpi-cxx --enable-pretty-print-stacktrace 
--enable-binaries --enable-trace --enable-static=yes --enable-debug 
--with-devel-headers=1 --with-mpi-param-check=always --with-ft=cr 
--enable-ft-thread --with-blcr=/usr/local/blcr/ 
--with-blcr-libdir=/usr/local/blcr/lib --enable-mpi-threads=yes
#########################

Question:



what do you think can be wrong? Please instruct me on how to resolve this 
problem.


Thank you

Jean


     

--- On Mon, 11/1/10, Josh Hursey <jjhur...@open-mpi.org> wrote:

From: Josh Hursey <jjhur...@open-mpi.org>
Subject: Re: [OMPI users] checkpointing multi node and multi process 
applications
To: "Open MPI Users" <us...@open-mpi.org>
List-Post: users@lists.open-mpi.org
Date: Monday, 11 January, 2010, 21:42


On Dec 19, 2009, at 7:42 AM, Jean Potsam wrote:

> Hi Everyone,
>                        I am trying to checkpoint an mpi application running 
>on multiple nodes. However, I get some error messages when i trigger the 
>checkpointing process.
> 
> Error: expected_component: PID information unavailable!
> Error: expected_component: Component Name information unavailable!
> 
> I am using  open mpi 1.3 and blcr 0.8.1

Can you try the v1.4 release and see if the problem persists?

> 
> I execute my application as follows:
> 
> mpirun -am ft-enable-cr -np 3 --hostfile hosts gol.
> 
> My question:
> 
> Does openmpi with blcr support checkpointing of multi node execution of mpi 
> application? If so, can you provide me with some information on how to 
> achieve this.

Open MPI is able to checkpoint a multi-node application (that's what it was 
designed to do). There are some examples at the link below:
  http://www.osl.iu.edu/research/ft/ompi-cr/examples.php

-- Josh

> 
> Cheers,
> 
> Jean.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



      

Reply via email to