Am 09.04.2017 um 15:47 schrieb Yong Wu: > Reuti, > Thanks for your reply again! > > >I can assure you, that for me and others it's working. > But it's not working for me. > > >Aha, I only set the $OMP_ROOT/etc/openmpi-mca-params.conf to have an entry > >plm_rsh_agent=foo to have it set for all users automatically. > >I didn't played with a source modification though. > >Nevertheless: > >Can you try with the original Open MPI 2.0.2 and call ORCA with: > >https://orcaforum.cec.mpg.de/viewtopic.php?f=9&t=2656 > I add an entry plm_rsh_agent=foo to the path of openmpi-mca-params.conf > (/share/apps/mpi/openmpi2.0.2-ifort/etc/openmpi-mca-params.conf), and > resubmit the job, but get the error: "[file orca_main/mainchk.cpp, line 130]: > Error (ORCA_MAIN): ... aborting the run." > > I enter the line "time /share/apps/orca4.0.0/orca test.inp "-mca > plm_rsh_agent foo --bind-to none" > ${SGE_O_WORKDIR}/test.log" instead of > "time /share/apps/orca4.0.0/orca test.inp > ${SGE_O_WORKDIR}/test.log", and > get the same error: "[file orca_main/mainchk.cpp, line 130]: Error > (ORCA_MAIN): ... aborting the run."
But this is now different from the original error message: one machine in the hostfile is not in the allocation. This error is gone? -- Reuti > >I'm not sure whether Open MPI will resolve the hostnames to their TCP/IP > >address, or does just a literal comparison - which fails. > When used the mpich PE, I modify the startmpi.sh of all compute nodes, > I change the line in your PeHostfile2MachineFile() subroutine: "host=`echo > $line|cut -f1 -d" "|cut -f1 -d"."`" to "host=`echo $line|cut -f1 -d" "`" > and resubmit the job, but get the error: "[file orca_main/mainchk.cpp, line > 130]: Error (ORCA_MAIN): ... aborting the run." > > Best regards, > Yong Wu > > 2017-04-09 18:27 GMT+08:00 Reuti <re...@staff.uni-marburg.de>: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi, > > Am 09.04.2017 um 11:14 schrieb Yong Wu: > > > Dear Reuti, > > Thank you very much! > > The jobname.nodes file is not necessary for parallel ORCA. And my > > "mpivars.sh" is also not a problem. > > ORCA3.0.3 program is compiled with openmpi-1.6.5, which can run normally on > > multiple node in gridengine. > > While ORCA4.0.0 program is compiled with openmpi-2.0.2, and cannot run on > > multiple node in gridengine. > > Maybe it is a bug of openmpi-2.0.x for the orca running on multiple node in > > gridengine. > > I can assure you, that for me and others it's working. > > > > I download the latest stable version of openmpi, but the error is also > > appeared in openmpi-2.1.0. The bug maybe not fixed in the latest stable > > version. > > > > >The Open MPI bug you checked already: > > >https://www.mail-archive.com/users@lists.open-mpi.org/msg30824.html > > Thanks for your information. I read it, but I am not solve this problem. I > > modify the code file of "orte/mca/plm/rsh/plm_rsh_component.c" following > > this > > address:https://github.com/open-mpi/ompi/commit/dee2d8646d2e2055e2c86db9c207403366a2453d#diff-f556f53efc98e71d3bd13ee9945949fe > > and recompiled the openmpi, but has no effect. > > Aha, I only set the $OMP_ROOT/etc/openmpi-mca-params.conf to have an entry > plm_rsh_agent=foo to have it set for all users automatically. > > I didn't played with a source modification though. > > Nevertheless: > > Can you try with the original Open MPI 2.0.2 and call ORCA with: > > https://orcaforum.cec.mpg.de/viewtopic.php?f=9&t=2656 > > > > >Please change the line in your PeHostfile2MachineFile() subroutine: > > >host=`echo $line|cut -f1 -d" "|cut -f1 -d"."` > > >to: > > >host=`echo $line|cut -f1 -d" "` > > >This should leave the ".local" domain, > > This is also not a problem. Because of my “/etc/hosts” > > 10.1.1.1 cluster.local cluster > > 10.1.255.254 compute-0-0.local compute-0-0 > > 10.1.255.253 compute-0-1.local compute-0-1 > > 10.1.255.244 compute-0-10.local compute-0-10 > > 10.1.255.243 compute-0-11.local compute-0-11 > > I'm not sure whether Open MPI will resolve the hostnames to their TCP/IP > address, or does just a literal comparison - which fails. > > - -- Reuti > -----BEGIN PGP SIGNATURE----- > Comment: GPGTools - https://gpgtools.org > > iEYEARECAAYFAljqDHoACgkQo/GbGkBRnRo77QCgjcs9bKAKg0TPt2AUUOF3g/cb > /sIAn23dn3HaYNGZ7+dqULfMtXyOOlD1 > =3uu2 > -----END PGP SIGNATURE----- > _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users