Re: [OMPI users] MPI_Comm_spawn_multiple

2011-02-21 Thread Ralph Castain
I very much doubt that either of those mappers has ever been tested against comm_spawn. Just glancing thru them, I don't see an immediate reason why loadbalance wouldn't work, but the error indicates that the system wound up mapping one or more processes to an unknown node. We are revising the

[OMPI users] MPI_Comm_spawn_multiple

2011-02-21 Thread Skouson, Gary B
I'm trying to use MPI_Comm_spawn_multiple and it doesn't seem to always work like I'd expect. The simple test code I have starts a couple of master processes and then tries to spawn a couple of worker threads on each of the nodes running the master processes. I was using 1.5.1, but gave 1.5.2r

Re: [OMPI users] Beginner's question: why multiple sends or receives don't work?

2011-02-21 Thread Jeff Squyres
It's because you're waiting on the receive request to complete before the send request. This likely works locally because the message transfer is through shared memory and is fast, but it's still an inherently unsafe way to block waiting for completion (i.e., the receive might not complete if t

Re: [OMPI users] RoCE (IBoE) & OpenMPI

2011-02-21 Thread Jeff Squyres
Random thought: is there a check to ensure that the SL MCA param is not set in a RoCE environment? If not, we should probably add a show_help warning if the SL MCA param is set when using RoCE (i.e., that its value will be ignored). On Feb 19, 2011, at 12:22 AM, Shamis, Pavel wrote: > As far

Re: [OMPI users] --without-tm [SEC=UNCLASSIFIED]

2011-02-21 Thread Joshua Hursey
There is no restriction to use the C/R functionality in Open MPI in a TM environment (that I am aware of), if you use the ompi-checkpoint/ompi-restart commands directly. If you want TM to checkpoint/restart Open MPI processes for you as part of the resource management role, then there is a bit

Re: [OMPI users] --without-tm [SEC=UNCLASSIFIED]

2011-02-21 Thread Jeff Squyres
On Feb 21, 2011, at 12:50 AM, DOHERTY, Greg wrote: > blcr needs cr_mpirun to start the job without torque support to be able > to checkpoint the mpi job correctly. Josh -- Do we have a restriction on BLCR support when used with TM? -- Jeff Squyres jsquy...@cisco.com For corporate legal informa

Re: [OMPI users] --without-tm [SEC=UNCLASSIFIED]

2011-02-21 Thread Ralph Castain
Simplest soln: add -bynode to your mpirun cmd line On Feb 20, 2011, at 10:50 PM, DOHERTY, Greg wrote: > In order to be able to checkpoint openmpi jobs with blcr, we have > configured openmpi as follows > > ./configure --prefix=/data1/packages/openmpi/1.5.1-blcr-without-tm > --disable-openib-co

[OMPI users] --without-tm [SEC=UNCLASSIFIED]

2011-02-21 Thread DOHERTY, Greg
In order to be able to checkpoint openmpi jobs with blcr, we have configured openmpi as follows ./configure --prefix=/data1/packages/openmpi/1.5.1-blcr-without-tm --disable-openib-connectx-xrc --disable-openib-rdmacm --with-ft=cr --enable-mpi-threads --enable-ft-thread --with-blcr=/usr --with-blc