I will try building a newer ompi version in my home directory, but that will take me some time.
qconf is not available to me on any machine. It provides that same error wherever I am able to try it: > denied: host ". <http://dblade65.cs.brown.edu/>.." is neither submit nor admin host Here is what it produces when I have a sysadmin run it: $ qconf -sconf | egrep "(command|daemon)" qlogin_command /sysvol/sge.test/bin/qlogin-wrapper qlogin_daemon /sysvol/sge.test/bin/grid-sshd -i rlogin_command builtin rlogin_daemon builtin rsh_command builtin rsh_daemon builtin does that suggest anything? Thanks! -David Laidlaw On Thu, Jul 25, 2019 at 5:21 PM Reuti <re...@staff.uni-marburg.de> wrote: > > Am 25.07.2019 um 23:00 schrieb David Laidlaw: > > > Here is most of the command output when run on a grid machine: > > > > dblade65.dhl(101) mpiexec --version > > mpiexec (OpenRTE) 2.0.2 > > This is some time old. I would suggest to install a fresh one. You can > even compile one in your home directory and install it e.g. in > $HOME/local/openmpi-3.1.4-gcc_7.4.0-shared ( by --prefix=…intended path…) > and then access this for all your jobs (adjust for your version of gcc). In > your ~/.bash_profile and the job script: > > DEFAULT_MANPATH="$(manpath -q)" > MY_OMPI="$HOME/local/openmpi-3.1.4_gcc-7.4.0_shared" > export PATH="$MY_OMPI/bin:$PATH" > export > LD_LIBRARY_PATH="$MY_OMPI/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" > export MANPATH="$MY_OMPI/share/man${DEFAULT_MANPATH:+:$DEFAULT_MANPATH}" > unset MY_OMPI > unset DEFAULT_MANPATH > > Essentially there is no conflict with the already installed version. > > > > dblade65.dhl(102) ompi_info | grep grid > > MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component > v2.0.2) > > dblade65.dhl(103) c > > denied: host "dblade65.cs.brown.edu" is neither submit nor admin host > > dblade65.dhl(104) > > On a node it’s ok this way. > > > > Does that suggest anything? > > > > qconf is restricted to sysadmins, which I am not. > > What error is output if you try it anyway? Usually the viewing is always > possible. > > > > I would note that we are running debian stretch on the cluster > machines. On some of our other (non-grid) machines, running debian buster, > the output is: > > > > cslab3d.dhl(101) mpiexec --version > > mpiexec (OpenRTE) 3.1.3 > > Report bugs to http://www.open-mpi.org/community/help/ > > cslab3d.dhl(102) ompi_info | grep grid > > MCA ras: gridengine (MCA v2.1.0, API v2.0.0, Component > v3.1.3) > > If you compile on such a machine and intend to run it in the cluster it > won't work, as the versions don't match. Therefore the above solution, to > use a personal version available in your $HOME for compiling and running > the applications. > > Side note: Open MPI binds the processes to cores by default. In case more > than one MPI job is running on a node one will have to use `mpiexec > --bind-to none …` as otherwise all jobs on this node will use core 0 > upwards. > > -- Reuti > > > > Thanks! > > > > -David Laidlaw > > > > On Thu, Jul 25, 2019 at 2:13 PM Reuti <re...@staff.uni-marburg.de> > wrote: > > > > Am 25.07.2019 um 18:59 schrieb David Laidlaw via users: > > > > > I have been trying to run some MPI jobs under SGE for almost a year > without success. What seems like a very simple test program fails; the > ingredients of it are below. Any suggestions on any piece of the test, > reasons for failure, requests for additional info, configuration thoughts, > etc. would be much appreciated. I suspect the linkage between SGIEand MPI, > but can't identify the problem. We do have SGE support build into MPI. We > also have the SGE parallel environment (PE) set up as described in several > places on the web. > > > > > > Many thanks for any input! > > > > Did you compile Open MPI on your own or was it delivered with the Linux > distribution? That it tries to use `ssh` is quite strange, as nowadays Open > MPI and others have built-in support to detect that they are running under > the control of a queuing system. It should use `qrsh` in your case. > > > > What does: > > > > mpiexec --version > > ompi_info | grep grid > > > > reveal? What does: > > > > qconf -sconf | egrep "(command|daemon)" > > > > show? > > > > -- Reuti > > > > > > > Cheers, > > > > > > -David Laidlaw > > > > > > > > > > > > > > > Here is how I submit the job: > > > > > > /usr/bin/qsub /gpfs/main/home/dhl/liggghtsTest/hello2/runme > > > > > > > > > Here is what is in runme: > > > > > > #!/bin/bash > > > #$ -cwd > > > #$ -pe orte_fill 1 > > > env PATH="$PATH" /usr/bin/mpirun --mca plm_base_verbose 1 -display- > > > allocation ./hello > > > > > > > > > Here is hello.c: > > > > > > #include <mpi.h> > > > #include <stdio.h> > > > #include <unistd.h> > > > #include <stdlib.h> > > > > > > int main(int argc, char** argv) { > > > // Initialize the MPI environment > > > MPI_Init(NULL, NULL); > > > > > > // Get the number of processes > > > int world_size; > > > MPI_Comm_size(MPI_COMM_WORLD, &world_size); > > > > > > // Get the rank of the process > > > int world_rank; > > > MPI_Comm_rank(MPI_COMM_WORLD, &world_rank); > > > > > > // Get the name of the processor > > > char processor_name[MPI_MAX_PROCESSOR_NAME]; > > > int name_len; > > > MPI_Get_processor_name(processor_name, &name_len); > > > > > > // Print off a hello world message > > > printf("Hello world from processor %s, rank %d out of %d > processors\n", > > > processor_name, world_rank, world_size); > > > // system("printenv"); > > > > > > sleep(15); // sleep for 60 seconds > > > > > > // Finalize the MPI environment. > > > MPI_Finalize(); > > > } > > > > > > > > > This command will build it: > > > > > > mpicc hello.c -o hello > > > > > > > > > Running produces the following: > > > > > > /var/spool/gridengine/execd/dblade01/active_jobs/1895308.1/pe_hostfile > > > dblade01.cs.brown.edu 1 shor...@dblade01.cs.brown.edu UNDEFINED > > > > -------------------------------------------------------------------------- > > > ORTE was unable to reliably start one or more daemons. > > > This usually is caused by: > > > > > > * not finding the required libraries and/or binaries on > > > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > > > settings, or configure OMPI with --enable-orterun-prefix-by-default > > > > > > * lack of authority to execute on one or more specified nodes. > > > Please verify your allocation and authorities. > > > > > > * the inability to write startup files into /tmp > (--tmpdir/orte_tmpdir_base). > > > Please check with your sys admin to determine the correct location > to use. > > > > > > * compilation of the orted with dynamic libraries when static are > required > > > (e.g., on Cray). Please check your configure cmd line and consider > using > > > one of the contrib/platform definitions for your system type. > > > > > > * an inability to create a connection back to mpirun due to a > > > lack of common network interfaces and/or no route found between > > > them. Please check network connectivity (including firewalls > > > and network routing requirements). > > > > -------------------------------------------------------------------------- > > > > > > > > > and: > > > > > > [dblade01:10902] [[37323,0],0] plm:rsh: final template argv: > > > /usr/bin/ssh <template> set path = ( /usr/bin $path ) ; if > ( $? > > > LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( $?LD_LIBRARY_PATH > > > == 0 ) setenv LD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_llp == 1 ) > setenv > > > LD_LIBRARY_PATH /usr/lib:$LD_LIBRARY_PATH ; if ( $?DYLD_LIBRARY > > > _PATH == 1 ) set OMPI_have_dllp ; if ( $?DYLD_LIBRARY_PATH == 0 ) > setenv > > > DYLD_LIBRARY_PATH /usr/lib ; if ( $?OMPI_have_dllp == 1 ) setenv DY > > > LD_LIBRARY_PATH /usr/lib:$DYLD_LIBRARY_PATH ; /usr/bin/orted > --hnp-topo-sig > > > 0N:2S:0L3:4L2:4L1:4C:4H:x86_64 -mca ess "env" -mca ess_base_jo > > > bid "2446000128" -mca ess_base_vpid "<template>" -mca > ess_base_num_procs "2" - > > > mca orte_hnp_uri "2446000128.0;usock;tcp://10.116.85.90:44791" > > > --mca plm_base_verbose "1" -mca plm "rsh" -mca orte_display_alloc "1" > -mca > > > pmix "^s1,s2,cray" > > > ssh_exchange_identification: read: Connection reset by peer > > > > > > > > > > > > _______________________________________________ > > > users mailing list > > > users@lists.open-mpi.org > > > https://lists.open-mpi.org/mailman/listinfo/users > > > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users