I am also curious as to why this would not work -- I was not under the
impression that tm_init() would fail from a non mother-superior node...?

FWIW: It has been our experience with both Torque and the various
flavors of PBS that you can repeatedly call tm_init() and tm_finalize()
within a single process, so I would be surprised if that was the issue.
Indeed, I'd have to double check, but I'm pretty sure that our MPI
processes do not call tm_init() (I believe that only mpirun does).

Prakash: are you running an unmodified version of Torque 2.0.0p7?


> -----Original Message-----
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Prakash Velayutham
> Sent: Friday, April 07, 2006 10:13 AM
> To: Open MPI Users
> Cc: pak....@sun.com
> Subject: Re: [OMPI users] Open MPI and Torque error
> 
> Pak Lui wrote:
> > Prakash,
> >
> > tm_poll: protocol number dis error 11
> > ret is 17002 instead of 0: tm_init failed
> > 3 processes killed (possibly by Open MPI)
> >
> > I encountered similar problem with OpenPBS before, which 
> also uses the 
> > TM interfaces. It returns a TM_ENOTCONNECTED (17002) when I 
> tried to 
> > call tm_init for the second time (which in turns call tm_poll and 
> > returned that errno).
> >
> > I think what you did to start tm_init from another node and 
> connect to 
> > another mom which I do not think is allowed. The TM module 
> in OpenMPI 
> > already called tm_init once. I am curious to know about the 
> reason that 
> > you need to call tm_init again?
> >
> > If you are curious to know about the implementation for 
> PBS, you can 
> > download the source from openpbs.org. OpenPBS source: 
> > v2.3.16/src/lib/Libifl/tm.c
> I am interested in getting this to work as I am working on 
> implementing 
> support for dynamic scheduling in Torque. I want any node in an MPI-2 
> job (basically Open MPI implementation) to be able to request the 
> Torque/PBS server for more nodes. I am doing a little study in that 
> right now. Instead of nodes talking directly to the server, I 
> want them 
> to be able to talk to Mother Superior and MS instead will talk to the 
> Server.
> 
> Could you please explain why this does not work now? And why it works 
> when I do the tm_init from MS, and only does not work from 
> any other MOM?
> 
> Thanks,
> Prakash
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

Reply via email to