Am 14.09.2011 um 00:11 schrieb Ralph Castain:

> I believe this is one of those strange cases that can catch us. The problem 
> is that we still try to use the qrsh launcher - we appear to ignore the 
> --without-sge configure option (it impacts our ability to read the 
> allocation, but not the launcher).

Did this change? I thought you need --with-sge to get SGE support as it's 
longer default since 1.3?

-- Reuti


> 
> Try setting the following:
> 
> -mca plm_rsh_disable_qrsh 1
> 
> on your cmd line. That should force it to avoid qrsh, and use rsh instead.
> 
> I'll make a note that we should fix this - if you configure out sge, you 
> shouldn't get the sge launcher.
> 
> 
> On Sep 13, 2011, at 3:54 PM, Blosch, Edwin L wrote:
> 
>> This version of OpenMPI I am running was built without any guidance 
>> regarding SGE in the configure command, but it was built on a system that 
>> did not have SGE, so I would presume support is absent.
>> 
>> My hope is that OpenMPI will not attempt to use SGE in any way. But perhaps 
>> it is trying to. 
>> 
>> Yes, I did supply a machinefile on my own.  It is formed on the fly within 
>> the submitted script by parsing the PE_HOSTFILE, and I leave the resulting 
>> file lying around, and the result appears to be correct, i.e. it includes 
>> those nodes (and only those nodes) allocated to the job.
>> 
>> 
>> 
>> -----Original Message-----
>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
>> Behalf Of Reuti
>> Sent: Tuesday, September 13, 2011 4:27 PM
>> To: Open MPI Users
>> Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE
>> 
>> Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L:
>> 
>>> I'm able to run this command below from an interactive shell window:
>>> 
>>> <path>/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
>>> /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup
>>> 
>>> but it does not work if I put it into a shell script and 'qsub' that script 
>>> to SGE.  I get the message shown at the bottom of this post. 
>>> 
>>> I've tried everything I can think of.  I would welcome any hints on how to 
>>> proceed. 
>>> 
>>> For what it's worth, this OpenMPI is 1.4.3 and I built it on another 
>>> system.  I am setting and exporting OPAL_PREFIX and as I said, all works 
>>> fine interactively just not in batch.  It was built with -disable-shared 
>>> and I don't see any shared libs under openmpi/lib, and I've done 'ldd' from 
>>> within the script, on both the application executable and on the orterun 
>>> command; no unresolved shared libraries.  So I don't think the error 
>>> message hinting at LD_LIBRARY_PATH issues is pointing me in the right 
>>> direction.
>>> 
>>> Thanks for any guidance,
>>> 
>>> Ed
>>> 
>> 
>> Oh, I missed this:
>> 
>> 
>>> error: executing task of job 139362 failed: execution daemon on host 
>>> "f8312" didn't accept task
>> 
>> did you supply a machinefile on your own? In a proper SGE integration it's 
>> running in a parallel environment. You defined and requested one? The error 
>> looks like it was started in a PE, but tried to access a node not granted 
>> for the actual job
>> 
>> -- Reuti
>> 
>> 
>>> --------------------------------------------------------------------------
>>> A daemon (pid 2818) died unexpectedly with status 1 while attempting
>>> to launch so we are aborting.
>>> 
>>> There may be more information reported by the environment (see above).
>>> 
>>> This may be because the daemon was unable to find all the needed shared
>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>>> location of the shared libraries on the remote nodes and this will
>>> automatically be forwarded to the remote nodes.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>> --------------------------------------------------------------------------
>>> mpirun: clean termination accomplished
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to