Interesting. My best guess is that the OMPI libraries aren't being found, though I'm a little surprised because the error message indicates an inability to malloc - but it's possible the message isn't accurate.
One thing stands out - I see you compiled your program with "javac". I suspect that is the source of the trouble - you really need to use the Java wrapper compiler "mpijavac" so all the libs get absorbed and/or linked correctly. On Dec 21, 2012, at 9:46 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi > >> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue >> appears to be in the Java side of things. For whatever reason, your >> Java VM is refusing to allow a malloc to succeed. I suspect it has >> something to do with its setup, but I'm not enough of a Java person >> to point you to the problem. >> >> Is it possible that the program was compiled against a different >> (perhaps incompatible) version of Java? > > No, I don't think so. A small Java program without MPI methods works. > > linpc1 bin 122 which mpicc > /usr/local/openmpi-1.9_64_cc/bin/mpicc > linpc1 bin 123 pwd > /usr/local/openmpi-1.9_64_cc/bin > linpc1 bin 124 grep jdk * > mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; > mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; > linpc1 bin 125 which java > /usr/local/jdk1.7.0_07-64/bin/java > linpc1 bin 126 > > > linpc1 prog 110 javac MiniProgMain.java > linpc1 prog 111 java MiniProgMain > Message 0 > Message 1 > Message 2 > Message 3 > Message 4 > linpc1 prog 112 mpiexec java MiniProgMain > Message 0 > Message 1 > Message 2 > Message 3 > Message 4 > linpc1 prog 113 mpiexec -np 2 java MiniProgMain > Message 0 > Message 1 > Message 2 > Message 3 > Message 4 > Message 0 > Message 1 > Message 2 > Message 3 > Message 4 > > > A small program which allocates buffer for a new string. > ... > stringBUFLEN = new String (string.substring (0, len)); > ... > > linpc1 prog 115 javac MemAllocMain.java > linpc1 prog 116 java MemAllocMain > Type something ("quit" terminates program): ffghhfhh > Received input: ffghhfhh > Converted to upper case: FFGHHFHH > Type something ("quit" terminates program): quit > Received input: quit > Converted to upper case: QUIT > > linpc1 prog 117 mpiexec java MemAllocMain > Type something ("quit" terminates program): fbhshnhjs > Received input: fbhshnhjs > Converted to upper case: FBHSHNHJS > Type something ("quit" terminates program): quit > Received input: quit > Converted to upper case: QUIT > linpc1 prog 118 > > I'm not sure if this is of any help, but the problem starts with > MPI methods. The following program calls just the Init() and > Finalize() method. > > tyr java 203 mpiexec -host linpc1 java InitFinalizeMain > -------------------------------------------------------------------------- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > mca_base_open failed > --> Returned value -2 instead of OPAL_SUCCESS > ... > > > Hopefully somebody will have an idea what goes wrong on my Linux > system. Thank you very much for any help in advance. > > Kind regards > > Siegmar > > >> Just shooting in the dark here - I suspect you'll have to ask someone >> more knowledgeable on JVMs. >> >> >> On Dec 21, 2012, at 7:32 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> Hi >>> >>>> I can't speak to the other issues, but for these - it looks like >>>> something isn't right in the system. Could be an incompatibility >>>> with Suse 12.1. >>>> >>>> What the errors are saying is that malloc is failing when used at >>>> a very early stage in starting the process. Can you run even a >>>> C-based MPI "hello" program? >>> >>> Yes. I have implemented more or less the same program in C and Java. >>> >>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi >>> Process 0 of 2 running on linpc0 >>> Process 1 of 2 running on linpc1 >>> >>> Now 1 slave tasks are sending greetings. >>> >>> Greetings from task 1: >>> message type: 3 >>> msg length: 132 characters >>> message: >>> hostname: linpc1 >>> operating system: Linux >>> release: 3.1.10-1.16-desktop >>> processor: x86_64 >>> >>> >>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> mca_base_open failed >>> --> Returned value -2 instead of OPAL_SUCCESS >>> ... >>> >>> >>> Thank you very much for any help in advance. >>> >>> Kind regards >>> >>> Siegmar >>> >>> >>> >>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>> >>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1). >>>>> >>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain >>>>> -------------------------------------------------------------------------- >>>>> It looks like opal_init failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during opal_init; some of which are due to configuration or >>>>> environment problems. This failure appears to be an internal failure; >>>>> here's some additional information (which may only be relevant to an >>>>> Open MPI developer): >>>>> >>>>> mca_base_open failed >>>>> --> Returned value -2 instead of OPAL_SUCCESS >>>>> ... >>>>> ompi_mpi_init: orte_init failed >>>>> --> Returned "Out of resource" (-2) instead of "Success" (0) >>>>> -------------------------------------------------------------------------- >>>>> *** An error occurred in MPI_Init >>>>> *** on a NULL communicator >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>> *** and potentially your MPI job) >>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not > able >>> to >>>>> aggregate error messages, and not able to guarantee that all other > processes >>>>> were killed! >>>>> ------------------------------------------------------- >>>>> Primary job terminated normally, but 1 process returned >>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>> ------------------------------------------------------- >>>>> -------------------------------------------------------------------------- >>>>> mpiexec detected that one or more processes exited with non-zero status, >>> thus >>>>> causing >>>>> the job to be terminated. The first process to do so was: >>>>> >>>>> Process name: [[16706,1],1] >>>>> Exit code: 1 >>>>> -------------------------------------------------------------------------- >>>>> >>>>> >>>>> >>>>> I use a valid environment on all machines. The problem occurs as well >>>>> when I compile and run the program directly on the Linux system. >>>>> >>>>> linpc1 java 101 mpijavac BcastIntMain.java >>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` >>> BcastIntMain >>>>> -------------------------------------------------------------------------- >>>>> It looks like opal_init failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during opal_init; some of which are due to configuration or >>>>> environment problems. This failure appears to be an internal failure; >>>>> here's some additional information (which may only be relevant to an >>>>> Open MPI developer): >>>>> >>>>> mca_base_open failed >>>>> --> Returned value -2 instead of OPAL_SUCCESS >>>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > <InitFinalizeMain.java>_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users