Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue appears to be in the Java side of things. For whatever reason, your Java VM is refusing to allow a malloc to succeed. I suspect it has something to do with its setup, but I'm not enough of a Java person to point you to the problem.
Is it possible that the program was compiled against a different (perhaps incompatible) version of Java? Just shooting in the dark here - I suspect you'll have to ask someone more knowledgeable on JVMs. On Dec 21, 2012, at 7:32 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi > >> I can't speak to the other issues, but for these - it looks like >> something isn't right in the system. Could be an incompatibility >> with Suse 12.1. >> >> What the errors are saying is that malloc is failing when used at >> a very early stage in starting the process. Can you run even a >> C-based MPI "hello" program? > > Yes. I have implemented more or less the same program in C and Java. > > tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi > Process 0 of 2 running on linpc0 > Process 1 of 2 running on linpc1 > > Now 1 slave tasks are sending greetings. > > Greetings from task 1: > message type: 3 > msg length: 132 characters > message: > hostname: linpc1 > operating system: Linux > release: 3.1.10-1.16-desktop > processor: x86_64 > > > tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > mca_base_open failed > --> Returned value -2 instead of OPAL_SUCCESS > ... > > > Thank you very much for any help in advance. > > Kind regards > > Siegmar > > > >> On Dec 21, 2012, at 1:41 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1). >>> >>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> mca_base_open failed >>> --> Returned value -2 instead of OPAL_SUCCESS >>> ... >>> ompi_mpi_init: orte_init failed >>> --> Returned "Out of resource" (-2) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> *** An error occurred in MPI_Init >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> *** and potentially your MPI job) >>> [(null):10586] Local abort before MPI_INIT completed successfully; not able > to >>> aggregate error messages, and not able to guarantee that all other >>> processes >>> were killed! >>> ------------------------------------------------------- >>> Primary job terminated normally, but 1 process returned >>> a non-zero exit code.. Per user-direction, the job has been aborted. >>> ------------------------------------------------------- >>> -------------------------------------------------------------------------- >>> mpiexec detected that one or more processes exited with non-zero status, > thus >>> causing >>> the job to be terminated. The first process to do so was: >>> >>> Process name: [[16706,1],1] >>> Exit code: 1 >>> -------------------------------------------------------------------------- >>> >>> >>> >>> I use a valid environment on all machines. The problem occurs as well >>> when I compile and run the program directly on the Linux system. >>> >>> linpc1 java 101 mpijavac BcastIntMain.java >>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` > BcastIntMain >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> mca_base_open failed >>> --> Returned value -2 instead of OPAL_SUCCESS >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users