Hi Gilles, > Could you please give apply the attached patch and try again with and > without --prefix ...
Everything works fine with your patch. Thank you very much for your help. Even the Java problem, which I reported last Friday in a separate e-mail, is solved with your patch. I assume that it originated from the faulty environment as well. tyr small_prog 110 mpiexec --prefix /usr/local/openmpi-1.8.5_64_cc \ -np 5 --host sunpc1,linpc1,tyr,rs0 init_finalize Hello! Hello! Hello! Hello! Hello! tyr small_prog 111 mpiexec -np 5 --host sunpc1,linpc1,tyr,rs0 init_finalize Hello! Hello! Hello! Hello! Hello! tyr small_prog 112 Kind regards and once more thank you very much Siegmar > it seems there was a mistake when the following commit was back ported to > the v1.8 branch > commit 10ff75e91c3f5dad18ea854fd0ee831b2ea066d7 > Author: Ralph Castain <r...@open-mpi.org> > Date: Fri Apr 17 19:35:34 2015 -0700 > > Per request from Andy Rieb, add ability to pass PATH and > LD_LIBRARY_PATH elements to ssh command > Per request from David Bigagli, add ability to pass ssh args > > Taken from open-mpi/ompi@12bfb27161fb2710d9b4327072776ff3333f0afc > > Cheers, > > Gilles > > FWIW, here are the details : > > the bug is in orte/mca/plm/rsh/plm_rsh_module.c: > > static int setup_launch(...) > { > ... > char *lib_base=NULL, *bin_base=NULL; > ... > lib_base = opal_basename(opal_install_dirs.libdir); > bin_base = opal_basename(opal_install_dirs.bindir); > ... > if (NULL != prefix_dir) { > ... > asprintf(&bin_base, "%s/%s", prefix_dir, value); > ... > } > if (NULL != lib_base || NULL != bin_base) { > ... > } else if (ORTE_PLM_RSH_SHELL_TCSH == remote_shell || > ORTE_PLM_RSH_SHELL_CSH == remote_shell) { > ... > (void)asprintf (&final_cmd, > "%s%s%s set path = ( %s $path ) ; " > ... > "setenv LD_LIBRARY_PATH %s ; " > ... > (NULL != bin_base ? bin_base : " "), > (NULL != lib_base ? lib_base : " "), > ... > > in your case, prefix_dir is NULL, so bin_base is "bin" and lib_base is > "lib64" > > > > > On Mon, May 18, 2015 at 2:01 AM, Siegmar Gross < > siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi Gilles, > > > > > I am having some hard time reading the logs on my tablet... > > > bottom line, did using --prefix /usr/local/openmpi-1.8.5_64_cc fix all > > > your issues ? > > > > Yes, it did and the environment is also correct when I use --prefix. > > > > tyr small_prog 109 mpiexec --prefix /usr/local/openmpi-1.8.5_64_cc -np 5 > > --host sunpc1,linpc1,tyr,rs0 init_finalize > > Hello! > > Hello! > > Hello! > > Hello! > > Hello! > > tyr small_prog 110 mpiexec -np 5 --host sunpc1,linpc1,tyr,rs0 > > init_finalize > > ld.so.1: ssh: fatal: relocation error: file /usr/bin/ssh: symbol > > SUNWcry_installed: referenced symbol not found > > ... > > > > > > > > Without --prefix the following part goes wrong as far as I can see > > and I get the wrong environment with "bin" and "lib64". > > > > ... > > > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:base:setup_vm > > assigning new daemon [[43332,0],2] to node linpc1 > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:base:setup_vm add > > new daemon [[43332,0],3] > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:base:setup_vm > > assigning new daemon [[43332,0],3] to node rs0 > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh: launching vm > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh: local shell: 2 > > (tcsh) > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh: assuming same > > remote shell as local shell > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh: remote shell: 2 > > (tcsh) > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh: final template > > argv: > > /usr/local/bin/ssh <template> set path = ( bin $path ) ; if ( > > $?LD_LIBRARY_PATH == 1 ) set OMPI_have_llp ; if ( > > $?LD_LIBRARY_PATH == 0 ) setenv LD_LIBRARY_PATH lib64 ; if > > ( $?OMPI_have_llp == 1 ) setenv LD_LIBRARY_PATH lib64:$LD_LIBRARY_PATH ; > > if ( $?DYLD_LIBRARY_PATH == 1 ) set OMPI_have_dllp > > ; if ( $?DYLD_LIBRARY_PATH == 0 ) setenv DYLD_LIBRARY_PATH > > lib64 ; if ( $?OMPI_have_dllp == 1 ) setenv DYLD_LIBRARY_PATH > > lib64:$DYLD_LIBRARY_PATH ; orted --hnp-topo-sig > > 2N:2S:0L3:0L2:0L1:2C:2H:sun4u -mca ess "env" -mca orte_ess_jobid > > "2839805952" -mca orte_ess_vpid "<template>" -mca orte_ess_num_procs "4" > > -mca orte_hnp_uri > > "2839805952.0;tcp://193.174.24.39:34971" --tree-spawn --mca > > plm_base_verbose "100" -mca plm > > "rsh" > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh:launch daemon 0 > > not a child of mine > > [tyr.informatik.hs-fulda.de:03938] [[43332,0],0] plm:rsh: adding node > > sunpc1 to launch list > > ... > > > > > > Do you know which file is responsible for the above part? > > > > > > > if not, can you try to add the --hetero-nodes option to mpiexec ? > > > > > > just to be sure, can you please confirm your login shell is csh/tcsh on > > all > > > your boxes ? > > > > It is and must be tcsh, because otherwise our environment wouldn't work. > > > > > > Kind regards > > > > Siegmar