> -----Original Message-----
> From: Prakash Velayutham [mailto:prakash.velayut...@cchmc.org] 
> Sent: Saturday, April 08, 2006 2:45 PM
> To: Jeff Squyres (jsquyres); us...@open-mpi.org
> Subject: Re: [OMPI users] Open MPI and Torque error
> 
> >>> jsquy...@cisco.com 04/08/06 7:10 AM >>>
> I am also curious as to why this would not work -- I was not under the
> impression that tm_init() would fail from a non 
> mother-superior node...?
> 
> What others say is that it will fail this way inside a Open MPI job as
> Open MPI's RTE is taking the only TM connection available. But the

Note that Open RTE does not hold a TM connection open because of the
one-TM-connection-per-MOM restriction (which was only recently
alleviated with Garrick's patch).  Open RTE's TM support opens a TM
connection, does its thing, and then closes the connection.

> strange thing is that it works from Mother Superior without Garrick's
> patch (actually, regardless of the patch, the behaviour is 
> the same, but
> I have not rigorously tested the patch in itself, so cannot comment
> about that), which I think should have failed according to the above
> contention.

Based on my explanation above, the behavior you have observed makes
sense.

> FWIW: It has been our experience with both Torque and the various
> flavors of PBS that you can repeatedly call tm_init() and 
> tm_finalize()
> within a single process, so I would be surprised if that was 
> the issue.
> Indeed, I'd have to double check, but I'm pretty sure that our MPI
> processes do not call tm_init() (I believe that only mpirun does).


> But I am running my code using mpirun, so is this expected 
> behaviour? I
> am attaching my simple code below:

Yes.  What I am saying is that only Open MPI's mpirun invokes tm_init()
-- the MPI processes do not invoke tm_init().  Hence, there is no
possibility of a TM connection contention from the MPI processes.

Even if you launch an MPI process on the same node as mpirun, there are
synchronization points that guarantee that MPI_INIT will not complete
until the TM connections from mpirun have completed and been
tm_finalized().

This is why I, too, am curious as to why your tm_init() is failing.  You
might have to dive a bit deeper in the TM library to figure it out.  :-\

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to