Am 13.09.2011 um 23:54 schrieb Blosch, Edwin L:

> This version of OpenMPI I am running was built without any guidance regarding 
> SGE in the configure command, but it was built on a system that did not have 
> SGE, so I would presume support is absent.

Whether SGE is installed on the built machine is not relevant. In contrast to 
Torque (and I think also SLURM) nothing is compiled into Open MPI which needs a 
library from the designated queuing system to support it. It will in case of 
SGE just check for the existence of some environment variables and call `qrsh 
-inherit ...`. Further startup is handled by SGE by the defined 
qrsh_daemon/qrsh_command.

So, to check it you can issue:

ompi_info | grep grid

Any output?


> My hope is that OpenMPI will not attempt to use SGE in any way. But perhaps 
> it is trying to. 
> 
> Yes, I did supply a machinefile on my own.  It is formed on the fly within 
> the submitted script by parsing the PE_HOSTFILE, and I leave the

Parsing the PE_HOSTFILE and prepare it in a format suitable for the actual 
parallel library is usually defined in start_proc_args to do it once for all 
users and applications using this parallel library. With a tight integration 
they could be set to NONE though.


> resulting file lying around, and the result appears to be correct, i.e. it 
> includes those nodes (and only those nodes) allocated to the job.

Well, even without compilation --with-sge you could achieve a so called tight 
integration and confuse the startup when. What does your PE look like? 
Depending whether Open MPI will start an task on the master node of the job by 
a local `qrsh -inherit ...` job_is_first_task needs to be set to FALSE (this 
allows one `qrsh -inherit ...`call to be made local). But if all is fine, the 
job script is already the first task and TRUE should work.

-- Reuti


> -----Original Message-----
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
> Behalf Of Reuti
> Sent: Tuesday, September 13, 2011 4:27 PM
> To: Open MPI Users
> Subject: EXTERNAL: Re: [OMPI users] Problem running under SGE
> 
> Am 13.09.2011 um 23:18 schrieb Blosch, Edwin L:
> 
>> I'm able to run this command below from an interactive shell window:
>> 
>> <path>/bin/mpirun --machinefile mpihosts.dat -np 16 -mca plm_rsh_agent 
>> /usr/bin/rsh -x MPI_ENVIRONMENT=1 ./test_setup
>> 
>> but it does not work if I put it into a shell script and 'qsub' that script 
>> to SGE.  I get the message shown at the bottom of this post. 
>> 
>> I've tried everything I can think of.  I would welcome any hints on how to 
>> proceed. 
>> 
>> For what it's worth, this OpenMPI is 1.4.3 and I built it on another system. 
>>  I am setting and exporting OPAL_PREFIX and as I said, all works fine 
>> interactively just not in batch.  It was built with -disable-shared and I 
>> don't see any shared libs under openmpi/lib, and I've done 'ldd' from within 
>> the script, on both the application executable and on the orterun command; 
>> no unresolved shared libraries.  So I don't think the error message hinting 
>> at LD_LIBRARY_PATH issues is pointing me in the right direction.
>> 
>> Thanks for any guidance,
>> 
>> Ed
>> 
> 
> Oh, I missed this:
> 
> 
>> error: executing task of job 139362 failed: execution daemon on host "f8312" 
>> didn't accept task
> 
> did you supply a machinefile on your own? In a proper SGE integration it's 
> running in a parallel environment. You defined and requested one? The error 
> looks like it was started in a PE, but tried to access a node not granted for 
> the actual job
> 
> -- Reuti
> 
> 
>> --------------------------------------------------------------------------
>> A daemon (pid 2818) died unexpectedly with status 1 while attempting
>> to launch so we are aborting.
>> 
>> There may be more information reported by the environment (see above).
>> 
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to