-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

BTW: The Open MPI bug you checked already:

https://www.mail-archive.com/users@lists.open-mpi.org/msg30824.html

- -- Reuti


Am 08.04.2017 um 20:42 schrieb Reuti:

> Hi,
> 
> Am 07.04.2017 um 16:04 schrieb Yong Wu:
> 
>> Thanks for your reply. 
>> First of all, I can run this job on multiple nodes without Torque/SGE 
>> resource manager, and also ok used with Torque. 
>> But this job does not work on multiple nodes with gridengine.
>> I doubt that this is caused by the parallel environment of gridengine. 
>> However, orte, mpi, mpich, I got the same error for these PEs of gridengine.
>> 
>> I answer your above mentioned question.
>>> Can you please post the output of the $PE_HOSTFILE and the converted 
>>> test.nodes for a run, and the allocation you got: qstat -g t
>> The output of $PE_HOSTFILE:
>> compute-0-34.local 16 bgmnode.q@compute-0-34.local UNDEFINED
>> compute-0-67.local 8 bgmnode.q@compute-0-67.local UNDEFINED
>> 
>> […]     
> 
> Okay.
> 
> What does happen, what error message is generated when you don't create the 
> "test.nodes" file at all?
> 
> 
>>> The "mpivars.sh" seems not to be in the default Open MPI compilation. Where 
>>> is it coming from, what's inside?
>> The "mpivars.sh" is touched by me, and the content:
>> $ cat /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh 
>> # PATH
>> if test -z "`echo $PATH | grep /share/apps/mpi/openmpi2.0.2-ifort/bin`"; then
> 
> Although I like that you scan for the existence of the paths in the 
> environment variable, it's more safe to add some just in front in any case. 
> Otherwise they could be at the end and overwritten by any path found earlier 
> in the environment variable.
> 
> 
>> […]
>> $ source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
>> $ ompi_info | grep gridengine
>>                MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component v2.0.2)
> 
> Ok, this is compiled in then.
> 
> 
>>> Side note:
>> I create the same directory on each nodes and also use the NFS shared 
>> directory for scratch directory. And use the following environment:
>> source /usr/share/Modules/init/sh
>> module load intel/compiler/2011.7.256
>> source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
>> export RSH_COMMAND="ssh"
>> 
>> Use these environments, I can run this orca job normally on multiple nodes 
>> without gridengine by type the command:"/share/apps/orca4.0.0/orca test.inp 
>> &>test.log &"
> 
> Please don't use "&" in the job script to put the job in the background. The 
> job script might end and SGE discovers this an kills all orphaned processes. 
> Also with Torque this shouldn't be necessary.
> 
> - -- Reuti

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAljpQ9oACgkQo/GbGkBRnRq9+ACgtLeZ+4/uFUYlrLACamBYk68a
3VwAnjLWNpK4KAoKsx0f/l783ra107lm
=/dgn
-----END PGP SIGNATURE-----

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to