Am 14.09.2011 um 00:11 schrieb Ralph Castain: > I believe this is one of those strange cases that can catch us. The problem > is that we still try to use the qrsh launcher - we appear to ignore the > --without-sge configure option (it impacts our ability to read the > allocation, but not the launcher).
Did this change? I thought you need --with-sge to get SGE support as it's longer default since 1.3? -- Reuti > > Try setting the following: > > -mca plm_rsh_disable_qrsh 1 > > on your cmd line. That should force it to avoid qrsh, and use rsh instead. > > I'll make a note that we should fix this - if you configure out sge, you > shouldn't get the sge launcher. > > > On Sep 13, 2011, at 3:54 PM, Blosch, Edwin L wrote: > >> This version of OpenMPI I am running was built without any guidance >> regarding SGE in the configure command, but it was built on a system that >> did not have SGE, so I would presume support is absent. >> >> My hope is that OpenMPI will not attempt to use SGE in any way. But perhaps >> it is trying to. >> >> Yes, I did supply a machinefile on my own. It is formed on the fly within >> the submitted script by parsing the PE_HOSTFILE, and I leave the resulting >> file lying around, and the result appears to be correct, i.e. it includes >> those nodes (and only those nodes) allocated to the job. >> >> >> >> -----Original Message----- >> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On >> Behalf Of Reuti >> Sent: Tuesday, September 13, 2011 4:27 PM >> To: Open MPI Users >> Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE >> >> Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L: >> >>> I'm able to run this command below from an interactive shell window: >>> >>> <path>/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent >>> /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup >>> >>> but it does not work if I put it into a shell script and 'qsub' that script >>> to SGE. I get the message shown at the bottom of this post. >>> >>> I've tried everything I can think of. I would welcome any hints on how to >>> proceed. >>> >>> For what it's worth, this OpenMPI is 1.4.3 and I built it on another >>> system. I am setting and exporting OPAL_PREFIX and as I said, all works >>> fine interactively just not in batch. It was built with -disable-shared >>> and I don't see any shared libs under openmpi/lib, and I've done 'ldd' from >>> within the script, on both the application executable and on the orterun >>> command; no unresolved shared libraries. So I don't think the error >>> message hinting at LD_LIBRARY_PATH issues is pointing me in the right >>> direction. >>> >>> Thanks for any guidance, >>> >>> Ed >>> >> >> Oh, I missed this: >> >> >>> error: executing task of job 139362 failed: execution daemon on host >>> "f8312" didn't accept task >> >> did you supply a machinefile on your own? In a proper SGE integration it's >> running in a parallel environment. You defined and requested one? The error >> looks like it was started in a PE, but tried to access a node not granted >> for the actual job >> >> -- Reuti >> >> >>> -------------------------------------------------------------------------- >>> A daemon (pid 2818) died unexpectedly with status 1 while attempting >>> to launch so we are aborting. >>> >>> There may be more information reported by the environment (see above). >>> >>> This may be because the daemon was unable to find all the needed shared >>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >>> location of the shared libraries on the remote nodes and this will >>> automatically be forwarded to the remote nodes. >>> -------------------------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> mpirun noticed that the job aborted, but has no info as to the process >>> that caused that situation. >>> -------------------------------------------------------------------------- >>> mpirun: clean termination accomplished >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users