-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Am 07.04.2017 um 16:04 schrieb Yong Wu:

> Thanks for your reply. 
> First of all, I can run this job on multiple nodes without Torque/SGE 
> resource manager, and also ok used with Torque. 
> But this job does not work on multiple nodes with gridengine.
> I doubt that this is caused by the parallel environment of gridengine. 
> However, orte, mpi, mpich, I got the same error for these PEs of gridengine.
> 
> I answer your above mentioned question.
> >Can you please post the output of the $PE_HOSTFILE and the converted 
> >test.nodes for a run, and the allocation you got: qstat -g t
> The output of $PE_HOSTFILE:
> compute-0-34.local 16 bgmnode.q@compute-0-34.local UNDEFINED
> compute-0-67.local 8 bgmnode.q@compute-0-67.local UNDEFINED
> 
> […]     

Okay.

What does happen, what error message is generated when you don't create the 
"test.nodes" file at all?


> > The "mpivars.sh" seems not to be in the default Open MPI compilation. Where 
> > is it coming from, what's inside?
> The "mpivars.sh" is touched by me, and the content:
> $ cat /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh 
> # PATH
> if test -z "`echo $PATH | grep /share/apps/mpi/openmpi2.0.2-ifort/bin`"; then

Although I like that you scan for the existence of the paths in the environment 
variable, it's more safe to add some just in front in any case. Otherwise they 
could be at the end and overwritten by any path found earlier in the 
environment variable.


> […]
> $ source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
> $ ompi_info | grep gridengine
>                  MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component 
> v2.0.2)

Ok, this is compiled in then.


> >Side note:
> I create the same directory on each nodes and also use the NFS shared 
> directory for scratch directory. And use the following environment:
> source /usr/share/Modules/init/sh
> module load intel/compiler/2011.7.256
> source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh
> export RSH_COMMAND="ssh"
> 
> Use these environments, I can run this orca job normally on multiple nodes 
> without gridengine by type the command:"/share/apps/orca4.0.0/orca test.inp 
> &>test.log &"

Please don't use "&" in the job script to put the job in the background. The 
job script might end and SGE discovers this an kills all orphaned processes. 
Also with Torque this shouldn't be necessary.

- -- Reuti
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAljpLwgACgkQo/GbGkBRnRodggCfVyEP95S61Q4JKALZL1aQRr2u
JZsAoJyl7Ee0R4I8h6BvVVysEdjbeAEi
=M+rH
-----END PGP SIGNATURE-----

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to