I am also curious as to why this would not work -- I was not under the impression that tm_init() would fail from a non mother-superior node...?
FWIW: It has been our experience with both Torque and the various flavors of PBS that you can repeatedly call tm_init() and tm_finalize() within a single process, so I would be surprised if that was the issue. Indeed, I'd have to double check, but I'm pretty sure that our MPI processes do not call tm_init() (I believe that only mpirun does). Prakash: are you running an unmodified version of Torque 2.0.0p7? > -----Original Message----- > From: users-boun...@open-mpi.org > [mailto:users-boun...@open-mpi.org] On Behalf Of Prakash Velayutham > Sent: Friday, April 07, 2006 10:13 AM > To: Open MPI Users > Cc: pak....@sun.com > Subject: Re: [OMPI users] Open MPI and Torque error > > Pak Lui wrote: > > Prakash, > > > > tm_poll: protocol number dis error 11 > > ret is 17002 instead of 0: tm_init failed > > 3 processes killed (possibly by Open MPI) > > > > I encountered similar problem with OpenPBS before, which > also uses the > > TM interfaces. It returns a TM_ENOTCONNECTED (17002) when I > tried to > > call tm_init for the second time (which in turns call tm_poll and > > returned that errno). > > > > I think what you did to start tm_init from another node and > connect to > > another mom which I do not think is allowed. The TM module > in OpenMPI > > already called tm_init once. I am curious to know about the > reason that > > you need to call tm_init again? > > > > If you are curious to know about the implementation for > PBS, you can > > download the source from openpbs.org. OpenPBS source: > > v2.3.16/src/lib/Libifl/tm.c > I am interested in getting this to work as I am working on > implementing > support for dynamic scheduling in Torque. I want any node in an MPI-2 > job (basically Open MPI implementation) to be able to request the > Torque/PBS server for more nodes. I am doing a little study in that > right now. Instead of nodes talking directly to the server, I > want them > to be able to talk to Mother Superior and MS instead will talk to the > Server. > > Could you please explain why this does not work now? And why it works > when I do the tm_init from MS, and only does not work from > any other MOM? > > Thanks, > Prakash > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >