-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 BTW: The Open MPI bug you checked already:
https://www.mail-archive.com/users@lists.open-mpi.org/msg30824.html - -- Reuti Am 08.04.2017 um 20:42 schrieb Reuti: > Hi, > > Am 07.04.2017 um 16:04 schrieb Yong Wu: > >> Thanks for your reply. >> First of all, I can run this job on multiple nodes without Torque/SGE >> resource manager, and also ok used with Torque. >> But this job does not work on multiple nodes with gridengine. >> I doubt that this is caused by the parallel environment of gridengine. >> However, orte, mpi, mpich, I got the same error for these PEs of gridengine. >> >> I answer your above mentioned question. >>> Can you please post the output of the $PE_HOSTFILE and the converted >>> test.nodes for a run, and the allocation you got: qstat -g t >> The output of $PE_HOSTFILE: >> compute-0-34.local 16 bgmnode.q@compute-0-34.local UNDEFINED >> compute-0-67.local 8 bgmnode.q@compute-0-67.local UNDEFINED >> >> […] > > Okay. > > What does happen, what error message is generated when you don't create the > "test.nodes" file at all? > > >>> The "mpivars.sh" seems not to be in the default Open MPI compilation. Where >>> is it coming from, what's inside? >> The "mpivars.sh" is touched by me, and the content: >> $ cat /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh >> # PATH >> if test -z "`echo $PATH | grep /share/apps/mpi/openmpi2.0.2-ifort/bin`"; then > > Although I like that you scan for the existence of the paths in the > environment variable, it's more safe to add some just in front in any case. > Otherwise they could be at the end and overwritten by any path found earlier > in the environment variable. > > >> […] >> $ source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh >> $ ompi_info | grep gridengine >> MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component v2.0.2) > > Ok, this is compiled in then. > > >>> Side note: >> I create the same directory on each nodes and also use the NFS shared >> directory for scratch directory. And use the following environment: >> source /usr/share/Modules/init/sh >> module load intel/compiler/2011.7.256 >> source /share/apps/mpi/openmpi2.0.2-ifort/bin/mpivars.sh >> export RSH_COMMAND="ssh" >> >> Use these environments, I can run this orca job normally on multiple nodes >> without gridengine by type the command:"/share/apps/orca4.0.0/orca test.inp >> &>test.log &" > > Please don't use "&" in the job script to put the job in the background. The > job script might end and SGE discovers this an kills all orphaned processes. > Also with Torque this shouldn't be necessary. > > - -- Reuti -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iEYEARECAAYFAljpQ9oACgkQo/GbGkBRnRq9+ACgtLeZ+4/uFUYlrLACamBYk68a 3VwAnjLWNpK4KAoKsx0f/l783ra107lm =/dgn -----END PGP SIGNATURE----- _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users