I can't speak to the other issues, but for these - it looks like something isn't right in the system. Could be an incompatibility with Suse 12.1.
What the errors are saying is that malloc is failing when used at a very early stage in starting the process. Can you run even a C-based MPI "hello" program? On Dec 21, 2012, at 1:41 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1). > > linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > mca_base_open failed > --> Returned value -2 instead of OPAL_SUCCESS > ... > ompi_mpi_init: orte_init failed > --> Returned "Out of resource" (-2) instead of "Success" (0) > -------------------------------------------------------------------------- > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > *** and potentially your MPI job) > [(null):10586] Local abort before MPI_INIT completed successfully; not able > to > aggregate error messages, and not able to guarantee that all other processes > were killed! > ------------------------------------------------------- > Primary job terminated normally, but 1 process returned > a non-zero exit code.. Per user-direction, the job has been aborted. > ------------------------------------------------------- > -------------------------------------------------------------------------- > mpiexec detected that one or more processes exited with non-zero status, thus > causing > the job to be terminated. The first process to do so was: > > Process name: [[16706,1],1] > Exit code: 1 > -------------------------------------------------------------------------- > > > > I use a valid environment on all machines. The problem occurs as well > when I compile and run the program directly on the Linux system. > > linpc1 java 101 mpijavac BcastIntMain.java > linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` BcastIntMain > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > mca_base_open failed > --> Returned value -2 instead of OPAL_SUCCESS