Hi > Interesting. My best guess is that the OMPI libraries aren't being > found, though I'm a little surprised because the error message > indicates an inability to malloc - but it's possible the message > isn't accurate. > > One thing stands out - I see you compiled your program with "javac". > I suspect that is the source of the trouble - you really need to use > the Java wrapper compiler "mpijavac" so all the libs get absorbed > and/or linked correctly.
No, I only compiled the first two programs (which don't use any MPI methods) with javac. The MPI program "InitFinalizeMain.java" was compiled with "mpijavac" (I use a script file and GNUmakefile). linpc1 java 102 make_classfiles ... =========== linpc1 =========== Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles InitFinalizeMain.java ... The other programs work also if I compile them with "mpijavac" linpc1 prog 107 mpijavac MemAllocMain.java linpc1 prog 108 mpiexec java -cp `pwd` MemAllocMain Type something ("quit" terminates program): dgdas Received input: dgdas Converted to upper case: DGDAS Type something ("quit" terminates program): quit Received input: quit Converted to upper case: QUIT linpc1 prog 109 My environment should be valid as well. LD_LIBRARY_PATH contains first the directories for 32 bit libraries and then the directories for 64 bit libraries. I have split the long lines for the PATH variables so that they are easier to read. linpc1 java 111 mpiexec java EnvironVarMain Operating system: Linux Processor architecture: x86_64 CLASSPATH: /usr/local/junit4.10: /usr/local/junit4.10/junit-4.10.jar: //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar: //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar: //usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar: /usr/local/javacc-5.0/javacc.jar: .: /home/fd1026/Linux/x86_64/mpi_classfiles LD_LIBRARY_PATH: /usr/lib: ... /usr/lib64: /usr/local/jdk1.7.0_07-64/jre/lib/amd64: /usr/local/gcc-4.7.1/lib64: /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1: /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1: /usr/local/lib64: /usr/local/ssl/lib64: /usr/lib64: /usr/X11R6/lib64: /usr/local/openmpi-1.9_64_cc/lib64: /home/fd1026/Linux/x86_64/lib64 linpc1 java 112 Can I provide any other information to solve this problem? Kind regards Siegmar > On Dec 21, 2012, at 9:46 AM, Siegmar Gross > <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi > > > >> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue > >> appears to be in the Java side of things. For whatever reason, your > >> Java VM is refusing to allow a malloc to succeed. I suspect it has > >> something to do with its setup, but I'm not enough of a Java person > >> to point you to the problem. > >> > >> Is it possible that the program was compiled against a different > >> (perhaps incompatible) version of Java? > > > > No, I don't think so. A small Java program without MPI methods works. > > > > linpc1 bin 122 which mpicc > > /usr/local/openmpi-1.9_64_cc/bin/mpicc > > linpc1 bin 123 pwd > > /usr/local/openmpi-1.9_64_cc/bin > > linpc1 bin 124 grep jdk * > > mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; > > mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; > > linpc1 bin 125 which java > > /usr/local/jdk1.7.0_07-64/bin/java > > linpc1 bin 126 > > > > > > linpc1 prog 110 javac MiniProgMain.java > > linpc1 prog 111 java MiniProgMain > > Message 0 > > Message 1 > > Message 2 > > Message 3 > > Message 4 > > linpc1 prog 112 mpiexec java MiniProgMain > > Message 0 > > Message 1 > > Message 2 > > Message 3 > > Message 4 > > linpc1 prog 113 mpiexec -np 2 java MiniProgMain > > Message 0 > > Message 1 > > Message 2 > > Message 3 > > Message 4 > > Message 0 > > Message 1 > > Message 2 > > Message 3 > > Message 4 > > > > > > A small program which allocates buffer for a new string. > > ... > > stringBUFLEN = new String (string.substring (0, len)); > > ... > > > > linpc1 prog 115 javac MemAllocMain.java > > linpc1 prog 116 java MemAllocMain > > Type something ("quit" terminates program): ffghhfhh > > Received input: ffghhfhh > > Converted to upper case: FFGHHFHH > > Type something ("quit" terminates program): quit > > Received input: quit > > Converted to upper case: QUIT > > > > linpc1 prog 117 mpiexec java MemAllocMain > > Type something ("quit" terminates program): fbhshnhjs > > Received input: fbhshnhjs > > Converted to upper case: FBHSHNHJS > > Type something ("quit" terminates program): quit > > Received input: quit > > Converted to upper case: QUIT > > linpc1 prog 118 > > > > I'm not sure if this is of any help, but the problem starts with > > MPI methods. The following program calls just the Init() and > > Finalize() method. > > > > tyr java 203 mpiexec -host linpc1 java InitFinalizeMain > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > mca_base_open failed > > --> Returned value -2 instead of OPAL_SUCCESS > > ... > > > > > > Hopefully somebody will have an idea what goes wrong on my Linux > > system. Thank you very much for any help in advance. > > > > Kind regards > > > > Siegmar > > > > > >> Just shooting in the dark here - I suspect you'll have to ask someone > >> more knowledgeable on JVMs. > >> > >> > >> On Dec 21, 2012, at 7:32 AM, Siegmar Gross > > <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> Hi > >>> > >>>> I can't speak to the other issues, but for these - it looks like > >>>> something isn't right in the system. Could be an incompatibility > >>>> with Suse 12.1. > >>>> > >>>> What the errors are saying is that malloc is failing when used at > >>>> a very early stage in starting the process. Can you run even a > >>>> C-based MPI "hello" program? > >>> > >>> Yes. I have implemented more or less the same program in C and Java. > >>> > >>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi > >>> Process 0 of 2 running on linpc0 > >>> Process 1 of 2 running on linpc1 > >>> > >>> Now 1 slave tasks are sending greetings. > >>> > >>> Greetings from task 1: > >>> message type: 3 > >>> msg length: 132 characters > >>> message: > >>> hostname: linpc1 > >>> operating system: Linux > >>> release: 3.1.10-1.16-desktop > >>> processor: x86_64 > >>> > >>> > >>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java > >>> HelloMainWithBarrier > >>> -------------------------------------------------------------------------- > >>> It looks like opal_init failed for some reason; your parallel process is > >>> likely to abort. There are many reasons that a parallel process can > >>> fail during opal_init; some of which are due to configuration or > >>> environment problems. This failure appears to be an internal failure; > >>> here's some additional information (which may only be relevant to an > >>> Open MPI developer): > >>> > >>> mca_base_open failed > >>> --> Returned value -2 instead of OPAL_SUCCESS > >>> ... > >>> > >>> > >>> Thank you very much for any help in advance. > >>> > >>> Kind regards > >>> > >>> Siegmar > >>> > >>> > >>> > >>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross > >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: > >>>> > >>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1). > >>>>> > >>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain > >>>>> -------------------------------------------------------------------------- > >>>>> It looks like opal_init failed for some reason; your parallel process is > >>>>> likely to abort. There are many reasons that a parallel process can > >>>>> fail during opal_init; some of which are due to configuration or > >>>>> environment problems. This failure appears to be an internal failure; > >>>>> here's some additional information (which may only be relevant to an > >>>>> Open MPI developer): > >>>>> > >>>>> mca_base_open failed > >>>>> --> Returned value -2 instead of OPAL_SUCCESS > >>>>> ... > >>>>> ompi_mpi_init: orte_init failed > >>>>> --> Returned "Out of resource" (-2) instead of "Success" (0) > >>>>> -------------------------------------------------------------------------- > >>>>> *** An error occurred in MPI_Init > >>>>> *** on a NULL communicator > >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > >>>>> *** and potentially your MPI job) > >>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not > > able > >>> to > >>>>> aggregate error messages, and not able to guarantee that all other > > processes > >>>>> were killed! > >>>>> ------------------------------------------------------- > >>>>> Primary job terminated normally, but 1 process returned > >>>>> a non-zero exit code.. Per user-direction, the job has been aborted. > >>>>> ------------------------------------------------------- > >>>>> -------------------------------------------------------------------------- > >>>>> mpiexec detected that one or more processes exited with non-zero > >>>>> status, > >>> thus > >>>>> causing > >>>>> the job to be terminated. The first process to do so was: > >>>>> > >>>>> Process name: [[16706,1],1] > >>>>> Exit code: 1 > >>>>> -------------------------------------------------------------------------- > >>>>> > >>>>> > >>>>> > >>>>> I use a valid environment on all machines. The problem occurs as well > >>>>> when I compile and run the program directly on the Linux system. > >>>>> > >>>>> linpc1 java 101 mpijavac BcastIntMain.java > >>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` > >>> BcastIntMain > >>>>> -------------------------------------------------------------------------- > >>>>> It looks like opal_init failed for some reason; your parallel process is > >>>>> likely to abort. There are many reasons that a parallel process can > >>>>> fail during opal_init; some of which are due to configuration or > >>>>> environment problems. This failure appears to be an internal failure; > >>>>> here's some additional information (which may only be relevant to an > >>>>> Open MPI developer): > >>>>> > >>>>> mca_base_open failed > >>>>> --> Returned value -2 instead of OPAL_SUCCESS > >>>> > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> > > <InitFinalizeMain.java>_______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >