Hi, > > I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361. > > Why doesn't "mpiexec" start a process on my local machine (it > > is not a matter of Java, because I have the same behaviour when > > I use "hostname")? > > > > tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \ > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > Process 0 of 3 running on sunpc4.informatik.hs-fulda.de > > Process 1 of 3 running on sunpc4.informatik.hs-fulda.de > > Process 2 of 3 running on sunpc1 > > ... > > > > tyr small_prog 142 mpiexec -np 3 -host tyr,sunpc4,sunpc1 hostname > > sunpc1 > > sunpc4.informatik.hs-fulda.de > > sunpc4.informatik.hs-fulda.de > > > > No idea - it works fine for me. Do you have an environmental > variable, or something in your default MCA param file, that > indicates "no_use_local"?
I have only built and installed Open MPI and I have no param file. I don't have a mca environment variable. tyr hello_1 136 grep local \ /usr/local/openmpi-1.9_64_cc/etc/openmpi-mca-params.conf # $sysconf is a directory on a local disk, it is likely that changes # component_path = /usr/local/lib/openmpi:~/my_openmpi_components tyr hello_1 143 env | grep -i mca tyr hello_1 144 > > The command breaks if I add a Linux machine. > > Check to ensure that the path and ld_library_path on your linux box > is being correctly set to point to the corresponding Linux OMPI libs. > It looks like that isn't the case. Remember, the Java bindings are > just that - they are bindings that wrap on top of the regular C > code. Thus, the underlying OMPI system remains system-dependent, > and you must have the appropriate native libraries installed on > each machine. I implemented a small program, which shows these values and they are wrong for MPI, but I have no idea why. The two entries at the beginning from PATH and LD_LIBRARY_PATH are not from our normal environment, because I add these values at the end of the environment variables PATH, LD_LIBRARY_PATH_32, and LD_LIBRARY_PATH_64. Afterwards I set LD_LIBRARY_PATH to LD_LIBRARY_PATH_64 on a 64-bit Solaris machine, to LD_LIBRARY_PATH_32 followed by LD_LIBRARY_PATH_64 on a 64-bit Linux machine, and to LD_LIBRARY_PATH_32 on every 32-bit machine. Now 1 slave tasks are sending their environment. Environment from task 1: message type: 3 msg length: 4622 characters message: hostname: tyr.informatik.hs-fulda.de operating system: SunOS release: 5.10 processor: sun4u PATH /usr/local/openmpi-1.9_64_cc/bin (!!!) /usr/local/openmpi-1.9_64_cc/bin (!!!) /usr/local/eclipse-3.6.1 ... /usr/local/openmpi-1.9_64_cc/bin (<- from our environment) LD_LIBRARY_PATH_32 /usr/lib /usr/local/jdk1.7.0_07/jre/lib/sparc ... /usr/local/openmpi-1.9_64_cc/lib (<- from our environment) LD_LIBRARY_PATH_64 /usr/lib/sparcv9 /usr/local/jdk1.7.0_07/jre/lib/sparcv9 ... /usr/local/openmpi-1.9_64_cc/lib64 (<- from our environment) LD_LIBRARY_PATH /usr/local/openmpi-1.9_64_cc/lib (!!!) /usr/local/openmpi-1.9_64_cc/lib64 (!!!) /usr/lib/sparcv9 /usr/local/jdk1.7.0_07/jre/lib/sparcv9 ... /usr/local/openmpi-1.9_64_cc/lib64 (<- from our environment) CLASSPATH /usr/local/junit4.10 /usr/local/junit4.10/junit-4.10.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dcore.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/j3dutils.jar //usr/local/jdk1.7.0_07/j3d/lib/ext/vecmath.jar /usr/local/javacc-5.0/javacc.jar . Without MPI the program uses our environment. tyr hello_1 147 diff env_with* 1,7c1 < < < Now 1 slave tasks are sending their environment. < < Environment from task 1: < message type: 3 < msg length: 4622 characters --- > Environment: 14,15d7 < /usr/local/openmpi-1.9_64_cc/bin < /usr/local/openmpi-1.9_64_cc/bin 81,82d72 < /usr/local/openmpi-1.9_64_cc/lib < /usr/local/openmpi-1.9_64_cc/lib64 tyr hello_1 148 I have attached the programs so that you can check yourself and hopefully get the same results. Do you modify PATH and LD_LIBRARY_PATH? Kind regards Siegmar > > tyr java 110 mpiexec -np 3 -host tyr,sunpc4,linpc4 \ > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > mca_base_open failed > > --> Returned value -2 instead of OPAL_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like orte_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during orte_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > opal_init failed > > --> Returned value Out of resource (-2) instead of ORTE_SUCCESS > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > ompi_mpi_init: orte_init failed > > --> Returned "Out of resource" (-2) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** An error occurred in MPI_Init > > *** on a NULL communicator > > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > > *** and potentially your MPI job) > > [linpc4:27369] Local abort before MPI_INIT completed successfully; > > not able to aggregate error messages, and not able to guarantee > > that all other processes were killed! > > ------------------------------------------------------- > > Primary job terminated normally, but 1 process returned > > a non-zero exit code.. Per user-direction, the job has been aborted. > > ------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpiexec detected that one or more processes exited with non-zero status, > > thus causing > > the job to be terminated. The first process to do so was: > > > > Process name: [[21095,1],2] > > Exit code: 1 > > -------------------------------------------------------------------------- > > > > > > tyr java 111 which mpijavac > > /usr/local/openmpi-1.9_32_cc/bin/mpijavac > > tyr java 112 more /usr/local/openmpi-1.9_32_cc/bin/mpijavac > > #!/usr/bin/env perl > > > > # WARNING: DO NOT EDIT THE mpijava.pl FILE AS IT IS GENERATED! > > # MAKE ALL CHANGES IN mpijava.pl.in > > > > # Copyright (c) 2011 Cisco Systems, Inc. All rights reserved. > > # Copyright (c) 2012 Oracle and/or its affiliates. All rights > > reserved. > > > > use strict; > > > > # The main purpose of this wrapper compiler is to check for > > # and adjust the Java class path to include the OMPI classes > > # in mpi.jar. The user may have specified a class path on > > # our cmd line, or it may be in the environment, so we have > > # to check for both. We also need to be careful not to > > # just override the class path as it probably includes classes > > # they need for their application! It also may already include > > # the path to mpi.jar, and while it doesn't hurt anything, we > > # don't want to include our class path more than once to avoid > > # user astonishment > > > > # Let the build system provide us with some critical values > > my $my_compiler = "/usr/local/jdk1.7.0_07/bin/javac"; > > my $ompi_classpath = "/usr/local/openmpi-1.9_32_cc/lib/mpi.jar"; > > > > # globals > > my $showme_arg = 0; > > my $verbose = 0; > > my $my_arg; > > ... > > > > > > All libraries are available. > > > > tyr java 113 ldd /usr/local/jdk1.7.0_07/bin/javac > > libthread.so.1 => /usr/lib/libthread.so.1 > > libjli.so => > > /export2/prog/SunOS_sparc/jdk1.7.0_07/bin/../jre/lib/sparc/jli/libjli.so > > libdl.so.1 => /usr/lib/libdl.so.1 > > libc.so.1 => /usr/lib/libc.so.1 > > libm.so.2 => /usr/lib/libm.so.2 > > /platform/SUNW,A70/lib/libc_psr.so.1 > > tyr java 114 ssh sunpc4 ldd /usr/local/jdk1.7.0_07/bin/javac > > libthread.so.1 => /usr/lib/libthread.so.1 > > libjli.so => > > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so > > libdl.so.1 => /usr/lib/libdl.so.1 > > libc.so.1 => /usr/lib/libc.so.1 > > libm.so.2 => /usr/lib/libm.so.2 > > tyr java 115 ssh linpc4 ldd /usr/local/jdk1.7.0_07/bin/javac > > linux-gate.so.1 => (0xffffe000) > > libpthread.so.0 => /lib/libpthread.so.0 (0xf77b2000) > > libjli.so => > > /usr/local/jdk1.7.0_07/bin/../jre/lib/i386/jli/libjli.so > > (0xf779d000) > > libdl.so.2 => /lib/libdl.so.2 (0xf7798000) > > libc.so.6 => /lib/libc.so.6 (0xf762b000) > > /lib/ld-linux.so.2 (0xf77ce000) > > > > > > I don't have any errors in the log files except the error for nfs. > > > > tyr openmpi-1.9-Linux.x86_64.32_cc 136 ls log.* > > log.configure.Linux.x86_64.32_cc log.make-install.Linux.x86_64.32_cc > > log.make-check.Linux.x86_64.32_cc log.make.Linux.x86_64.32_cc > > > > tyr openmpi-1.9-Linux.x86_64.32_cc 137 grep "Error 1" log.* > > log.make-check.Linux.x86_64.32_cc:make[3]: *** [check-TESTS] Error 1 > > log.make-check.Linux.x86_64.32_cc:make[1]: *** [check-recursive] Error 1 > > log.make-check.Linux.x86_64.32_cc:make: *** [check-recursive] Error 1 > > > > ... > > SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 32 failed) > > FAIL: opal_path_nfs > > ======================================================== > > 1 of 2 tests failed > > Please report to http://www.open-mpi.org/community/help/ > > ======================================================== > > make[3]: *** [check-TESTS] Error 1 > > ... > > > > > > It doesn't help to build the class files on Linux (which should be > > independent of the architecture anyway). > > > > tyr java 131 ssh linpc4 > > linpc4 fd1026 98 cd .../prog/mpi/java > > linpc4 java 99 make clean > > rm -f /home/fd1026/mpi_classfiles/HelloMainWithBarrier.class \ > > /home/fd1026/mpi_classfiles/HelloMainWithoutBarrier.class > > linpc4 java 100 make > > mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithBarrier.java > > mpijavac -d /home/fd1026/mpi_classfiles HelloMainWithoutBarrier.java > > > > linpc4 java 101 mpiexec -np 3 -host linpc4 \ > > java -cp $HOME/mpi_classfiles HelloMainWithBarrier > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > mca_base_open failed > > --> Returned value -2 instead of OPAL_SUCCESS > > ... > > > > Has anybody else this problem as well? Do you know a solution? > > Thank you very much for any help in advance. > > > > > > Kind regards > > > > Siegmar > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
/* A small MPI program, which delivers some information about its * machine, operating system, and some environment variables. * * * Compiling: * Store executable(s) into local directory. * mpicc -o <program name> <source code file name> * * Store executable(s) into predefined directories. * make * * Make program(s) automatically on all specified hosts. You must * edit the file "make_compile" and specify your host names before * you execute it. * make_compile * * Running: * LAM-MPI: * mpiexec -boot -np <number of processes> <program name> * or * mpiexec -boot \ * -host <hostname> -np <number of processes> <program name> : \ * -host <hostname> -np <number of processes> <program name> * or * mpiexec -boot [-v] -configfile <application file> * or * lamboot [-v] [<host file>] * mpiexec -np <number of processes> <program name> * or * mpiexec [-v] -configfile <application file> * lamhalt * * OpenMPI: * "host1", "host2", and so on can all have the same name, * if you want to start a virtual computer with some virtual * cpu's on the local host. The name "localhost" is allowed * as well. * * mpiexec -np <number of processes> <program name> * or * mpiexec --host <host1,host2,...> \ * -np <number of processes> <program name> * or * mpiexec -hostfile <hostfile name> \ * -np <number of processes> <program name> * or * mpiexec -app <application file> * * Cleaning: * local computer: * rm <program name> * or * make clean_all * on all specified computers (you must edit the file "make_clean_all" * and specify your host names before you execute it. * make_clean_all * * * File: environ_mpi.c Author: S. Gross * Date: 25.09.2012 * */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/utsname.h> #include "mpi.h" #define BUF_SIZE 8192 /* message buffer size */ #define MAX_TASKS 12 /* max. number of tasks */ #define SENDTAG 1 /* send message command */ #define EXITTAG 2 /* termination command */ #define MSGTAG 3 /* normal message token */ #define ENTASKS -1 /* error: too many tasks */ static void master (void); static void slave (void); int main (int argc, char *argv[]) { int mytid, /* my task id */ ntasks; /* number of parallel tasks */ MPI_Init (&argc, &argv); MPI_Comm_rank (MPI_COMM_WORLD, &mytid); MPI_Comm_size (MPI_COMM_WORLD, &ntasks); if (mytid == 0) { master (); } else { slave (); } MPI_Finalize (); return EXIT_SUCCESS; } /* Function for the "master task". The master sends a request to all * slaves asking for a message. After receiving and printing the * messages he sends all slaves a termination command. * * input parameters: not necessary * output parameters: not available * return value: nothing * side effects: no side effects * */ void master (void) { int ntasks, /* number of parallel tasks */ mytid, /* my task id */ num, /* number of entries */ i; /* loop variable */ char buf[BUF_SIZE + 1]; /* message buffer (+1 for '\0') */ MPI_Status stat; /* message details */ MPI_Comm_rank (MPI_COMM_WORLD, &mytid); MPI_Comm_size (MPI_COMM_WORLD, &ntasks); if (ntasks > MAX_TASKS) { fprintf (stderr, "Error: Too many tasks. Try again with at most " "%d tasks.\n", MAX_TASKS); /* terminate all slave tasks */ for (i = 1; i < ntasks; ++i) { MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD); } MPI_Finalize (); exit (ENTASKS); } printf ("\n\nNow %d slave tasks are sending their environment.\n\n", ntasks - 1); /* request messages from slave tasks */ for (i = 1; i < ntasks; ++i) { MPI_Send ((char *) NULL, 0, MPI_CHAR, i, SENDTAG, MPI_COMM_WORLD); } /* wait for messages and print greetings */ for (i = 1; i < ntasks; ++i) { MPI_Recv (buf, BUF_SIZE, MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &stat); MPI_Get_count (&stat, MPI_CHAR, &num); buf[num] = '\0'; /* add missing end-of-string */ printf ("Environment from task %d:\n" " message type: %d\n" " msg length: %d characters\n" " message: %s\n\n", stat.MPI_SOURCE, stat.MPI_TAG, num, buf); } /* terminate all slave tasks */ for (i = 1; i < ntasks; ++i) { MPI_Send ((char *) NULL, 0, MPI_CHAR, i, EXITTAG, MPI_COMM_WORLD); } } /* Function for "slave tasks". The slave task sends its hostname, * operating system name and release, and processor architecture * as a message to the master. * * input parameters: not necessary * output parameters: not available * return value: nothing * side effects: no side effects * */ void slave (void) { struct utsname sys_info; /* system information */ int mytid, /* my task id */ num_env_vars, /* # of environment variables */ i, /* loop variable */ more_to_do; char buf[BUF_SIZE], /* message buffer */ *env_vars[] = {"PATH", "LD_LIBRARY_PATH_32", "LD_LIBRARY_PATH_64", "LD_LIBRARY_PATH", "CLASSPATH"}; MPI_Status stat; /* message details */ MPI_Comm_rank (MPI_COMM_WORLD, &mytid); num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]); more_to_do = 1; while (more_to_do == 1) { /* wait for a message from the master task */ MPI_Recv (buf, BUF_SIZE, MPI_CHAR, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &stat); if (stat.MPI_TAG != EXITTAG) { uname (&sys_info); strcpy (buf, "\n hostname: "); strncpy (buf + strlen (buf), sys_info.nodename, BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), "\n operating system: ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), sys_info.sysname, BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), "\n release: ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), sys_info.release, BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), "\n processor: ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), sys_info.machine, BUF_SIZE - strlen (buf)); for (i = 0; i < num_env_vars; ++i) { char *env_val, /* pointer to environment value */ *delimiter = ":" , /* field delimiter for "strtok" */ *next_tok; /* next token */ env_val = getenv (env_vars[i]); if (env_val != NULL) { if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE) { strncpy (buf + strlen (buf), "\n ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), env_vars[i], BUF_SIZE - strlen (buf)); } else { fprintf (stderr, "Buffer too small. Couldn't add \"%s\"." "\n\n", env_vars[i]); } /* Get first token in "env_val". "strtok" skips all * characters that are contained in the current delimiter * string. If it finds a character which is not contained * in the delimiter string, it is the start of the first * token. Now it searches for the next character which is * part of the delimiter string. If it finds one it will * overwrite it by a '\0' to terminate the first token. * Otherwise the token extends to the end of the string. * Subsequent calls of "strtok" use a NULL pointer as first * argument and start searching from the saved position * after the last token. "strtok" returns NULL if it * couldn't find a token. */ next_tok = strtok (env_val, delimiter); while (next_tok != NULL) { if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE) { strncpy (buf + strlen (buf), "\n ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), next_tok, BUF_SIZE - strlen (buf)); } else { fprintf (stderr, "Buffer too small. Couldn't add \"%s\" " "to %s.\n\n", next_tok, env_vars[i]); } /* get next token */ next_tok = strtok (NULL, delimiter); } } } MPI_Send (buf, strlen (buf), MPI_CHAR, stat.MPI_SOURCE, MSGTAG, MPI_COMM_WORLD); } else { more_to_do = 0; /* terminate */ } } }
/* A small program, which delivers some information about its * machine, operating system, and some environment variables. * * * Compiling: * Store executable(s) into local directory. * (g)cc -o environ_without_mpi environ_without_mpi.c * * Running: * environ_without_mpi * * * File: environ_without_mpi.c Author: S. Gross * Date: 25.09.2012 * */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/utsname.h> #define BUF_SIZE 8192 /* message buffer size */ int main (int argc, char *argv[]) { struct utsname sys_info; /* system information */ int num_env_vars, /* # of environment variables */ i; /* loop variable */ char buf[BUF_SIZE], /* message buffer */ *env_vars[] = {"PATH", "LD_LIBRARY_PATH_32", "LD_LIBRARY_PATH_64", "LD_LIBRARY_PATH", "CLASSPATH"}; num_env_vars = sizeof (env_vars) / sizeof (env_vars[0]); uname (&sys_info); strcpy (buf, "\n hostname: "); strncpy (buf + strlen (buf), sys_info.nodename, BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), "\n operating system: ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), sys_info.sysname, BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), "\n release: ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), sys_info.release, BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), "\n processor: ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), sys_info.machine, BUF_SIZE - strlen (buf)); for (i = 0; i < num_env_vars; ++i) { char *env_val, /* pointer to environment value */ *delimiter = ":" , /* field delimiter for "strtok" */ *next_tok; /* next token */ env_val = getenv (env_vars[i]); if (env_val != NULL) { if ((strlen (buf) + strlen (env_vars[i]) + 6) < BUF_SIZE) { strncpy (buf + strlen (buf), "\n ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), env_vars[i], BUF_SIZE - strlen (buf)); } else { fprintf (stderr, "Buffer too small. Couldn't add \"%s\"." "\n\n", env_vars[i]); } /* Get first token in "env_val". "strtok" skips all * characters that are contained in the current delimiter * string. If it finds a character which is not contained * in the delimiter string, it is the start of the first * token. Now it searches for the next character which is * part of the delimiter string. If it finds one it will * overwrite it by a '\0' to terminate the first token. * Otherwise the token extends to the end of the string. * Subsequent calls of "strtok" use a NULL pointer as first * argument and start searching from the saved position * after the last token. "strtok" returns NULL if it * couldn't find a token. */ next_tok = strtok (env_val, delimiter); while (next_tok != NULL) { if ((strlen (buf) + strlen (next_tok) + 25) < BUF_SIZE) { strncpy (buf + strlen (buf), "\n ", BUF_SIZE - strlen (buf)); strncpy (buf + strlen (buf), next_tok, BUF_SIZE - strlen (buf)); } else { fprintf (stderr, "Buffer too small. Couldn't add \"%s\" " "to %s.\n\n", next_tok, env_vars[i]); } /* get next token */ next_tok = strtok (NULL, delimiter); } } } printf ("Environment:\n" " message: %s\n\n", buf); return EXIT_SUCCESS; }