I can confirm that the first program fails (bcast a single int). I'm trying to understand how the implementation works, but this may take some time (due to the holidays, etc.).
On Dec 22, 2012, at 2:53 AM, Siegmar Gross wrote: > Hi > >> Interesting. My best guess is that the OMPI libraries aren't being >> found, though I'm a little surprised because the error message >> indicates an inability to malloc - but it's possible the message >> isn't accurate. >> >> One thing stands out - I see you compiled your program with "javac". >> I suspect that is the source of the trouble - you really need to use >> the Java wrapper compiler "mpijavac" so all the libs get absorbed >> and/or linked correctly. > > No, I only compiled the first two programs (which don't use any MPI > methods) with javac. The MPI program "InitFinalizeMain.java" was > compiled with "mpijavac" (I use a script file and GNUmakefile). > > linpc1 java 102 make_classfiles > ... > =========== linpc1 =========== > Warning: untrusted X11 forwarding setup failed: xauth key data not generated > Warning: No xauth data; using fake authentication data for X11 forwarding. > mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles InitFinalizeMain.java > ... > > > The other programs work also if I compile them with "mpijavac" > > linpc1 prog 107 mpijavac MemAllocMain.java > linpc1 prog 108 mpiexec java -cp `pwd` MemAllocMain > Type something ("quit" terminates program): dgdas > Received input: dgdas > Converted to upper case: DGDAS > Type something ("quit" terminates program): quit > Received input: quit > Converted to upper case: QUIT > linpc1 prog 109 > > > My environment should be valid as well. LD_LIBRARY_PATH contains > first the directories for 32 bit libraries and then the directories > for 64 bit libraries. I have split the long lines for the PATH > variables so that they are easier to read. > > linpc1 java 111 mpiexec java EnvironVarMain > > Operating system: Linux Processor architecture: x86_64 > > CLASSPATH: > /usr/local/junit4.10: > /usr/local/junit4.10/junit-4.10.jar: > //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar: > //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar: > //usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar: > /usr/local/javacc-5.0/javacc.jar: > .: > /home/fd1026/Linux/x86_64/mpi_classfiles > > LD_LIBRARY_PATH: > /usr/lib: > ... > /usr/lib64: > /usr/local/jdk1.7.0_07-64/jre/lib/amd64: > /usr/local/gcc-4.7.1/lib64: > /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1: > /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1: > /usr/local/lib64: > /usr/local/ssl/lib64: > /usr/lib64: > /usr/X11R6/lib64: > /usr/local/openmpi-1.9_64_cc/lib64: > /home/fd1026/Linux/x86_64/lib64 > linpc1 java 112 > > Can I provide any other information to solve this problem? > > > Kind regards > > Siegmar > > >> On Dec 21, 2012, at 9:46 AM, Siegmar Gross >> <siegmar.gr...@informatik.hs-fulda.de> wrote: >> >>> Hi >>> >>>> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue >>>> appears to be in the Java side of things. For whatever reason, your >>>> Java VM is refusing to allow a malloc to succeed. I suspect it has >>>> something to do with its setup, but I'm not enough of a Java person >>>> to point you to the problem. >>>> >>>> Is it possible that the program was compiled against a different >>>> (perhaps incompatible) version of Java? >>> >>> No, I don't think so. A small Java program without MPI methods works. >>> >>> linpc1 bin 122 which mpicc >>> /usr/local/openmpi-1.9_64_cc/bin/mpicc >>> linpc1 bin 123 pwd >>> /usr/local/openmpi-1.9_64_cc/bin >>> linpc1 bin 124 grep jdk * >>> mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; >>> mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; >>> linpc1 bin 125 which java >>> /usr/local/jdk1.7.0_07-64/bin/java >>> linpc1 bin 126 >>> >>> >>> linpc1 prog 110 javac MiniProgMain.java >>> linpc1 prog 111 java MiniProgMain >>> Message 0 >>> Message 1 >>> Message 2 >>> Message 3 >>> Message 4 >>> linpc1 prog 112 mpiexec java MiniProgMain >>> Message 0 >>> Message 1 >>> Message 2 >>> Message 3 >>> Message 4 >>> linpc1 prog 113 mpiexec -np 2 java MiniProgMain >>> Message 0 >>> Message 1 >>> Message 2 >>> Message 3 >>> Message 4 >>> Message 0 >>> Message 1 >>> Message 2 >>> Message 3 >>> Message 4 >>> >>> >>> A small program which allocates buffer for a new string. >>> ... >>> stringBUFLEN = new String (string.substring (0, len)); >>> ... >>> >>> linpc1 prog 115 javac MemAllocMain.java >>> linpc1 prog 116 java MemAllocMain >>> Type something ("quit" terminates program): ffghhfhh >>> Received input: ffghhfhh >>> Converted to upper case: FFGHHFHH >>> Type something ("quit" terminates program): quit >>> Received input: quit >>> Converted to upper case: QUIT >>> >>> linpc1 prog 117 mpiexec java MemAllocMain >>> Type something ("quit" terminates program): fbhshnhjs >>> Received input: fbhshnhjs >>> Converted to upper case: FBHSHNHJS >>> Type something ("quit" terminates program): quit >>> Received input: quit >>> Converted to upper case: QUIT >>> linpc1 prog 118 >>> >>> I'm not sure if this is of any help, but the problem starts with >>> MPI methods. The following program calls just the Init() and >>> Finalize() method. >>> >>> tyr java 203 mpiexec -host linpc1 java InitFinalizeMain >>> -------------------------------------------------------------------------- >>> It looks like opal_init failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during opal_init; some of which are due to configuration or >>> environment problems. This failure appears to be an internal failure; >>> here's some additional information (which may only be relevant to an >>> Open MPI developer): >>> >>> mca_base_open failed >>> --> Returned value -2 instead of OPAL_SUCCESS >>> ... >>> >>> >>> Hopefully somebody will have an idea what goes wrong on my Linux >>> system. Thank you very much for any help in advance. >>> >>> Kind regards >>> >>> Siegmar >>> >>> >>>> Just shooting in the dark here - I suspect you'll have to ask someone >>>> more knowledgeable on JVMs. >>>> >>>> >>>> On Dec 21, 2012, at 7:32 AM, Siegmar Gross >>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>> >>>>> Hi >>>>> >>>>>> I can't speak to the other issues, but for these - it looks like >>>>>> something isn't right in the system. Could be an incompatibility >>>>>> with Suse 12.1. >>>>>> >>>>>> What the errors are saying is that malloc is failing when used at >>>>>> a very early stage in starting the process. Can you run even a >>>>>> C-based MPI "hello" program? >>>>> >>>>> Yes. I have implemented more or less the same program in C and Java. >>>>> >>>>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi >>>>> Process 0 of 2 running on linpc0 >>>>> Process 1 of 2 running on linpc1 >>>>> >>>>> Now 1 slave tasks are sending greetings. >>>>> >>>>> Greetings from task 1: >>>>> message type: 3 >>>>> msg length: 132 characters >>>>> message: >>>>> hostname: linpc1 >>>>> operating system: Linux >>>>> release: 3.1.10-1.16-desktop >>>>> processor: x86_64 >>>>> >>>>> >>>>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java >>>>> HelloMainWithBarrier >>>>> -------------------------------------------------------------------------- >>>>> It looks like opal_init failed for some reason; your parallel process is >>>>> likely to abort. There are many reasons that a parallel process can >>>>> fail during opal_init; some of which are due to configuration or >>>>> environment problems. This failure appears to be an internal failure; >>>>> here's some additional information (which may only be relevant to an >>>>> Open MPI developer): >>>>> >>>>> mca_base_open failed >>>>> --> Returned value -2 instead of OPAL_SUCCESS >>>>> ... >>>>> >>>>> >>>>> Thank you very much for any help in advance. >>>>> >>>>> Kind regards >>>>> >>>>> Siegmar >>>>> >>>>> >>>>> >>>>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross >>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote: >>>>>> >>>>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1). >>>>>>> >>>>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain >>>>>>> -------------------------------------------------------------------------- >>>>>>> It looks like opal_init failed for some reason; your parallel process is >>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>> fail during opal_init; some of which are due to configuration or >>>>>>> environment problems. This failure appears to be an internal failure; >>>>>>> here's some additional information (which may only be relevant to an >>>>>>> Open MPI developer): >>>>>>> >>>>>>> mca_base_open failed >>>>>>> --> Returned value -2 instead of OPAL_SUCCESS >>>>>>> ... >>>>>>> ompi_mpi_init: orte_init failed >>>>>>> --> Returned "Out of resource" (-2) instead of "Success" (0) >>>>>>> -------------------------------------------------------------------------- >>>>>>> *** An error occurred in MPI_Init >>>>>>> *** on a NULL communicator >>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>>>> *** and potentially your MPI job) >>>>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not >>> able >>>>> to >>>>>>> aggregate error messages, and not able to guarantee that all other >>> processes >>>>>>> were killed! >>>>>>> ------------------------------------------------------- >>>>>>> Primary job terminated normally, but 1 process returned >>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted. >>>>>>> ------------------------------------------------------- >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpiexec detected that one or more processes exited with non-zero >>>>>>> status, >>>>> thus >>>>>>> causing >>>>>>> the job to be terminated. The first process to do so was: >>>>>>> >>>>>>> Process name: [[16706,1],1] >>>>>>> Exit code: 1 >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> I use a valid environment on all machines. The problem occurs as well >>>>>>> when I compile and run the program directly on the Linux system. >>>>>>> >>>>>>> linpc1 java 101 mpijavac BcastIntMain.java >>>>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` >>>>> BcastIntMain >>>>>>> -------------------------------------------------------------------------- >>>>>>> It looks like opal_init failed for some reason; your parallel process is >>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>> fail during opal_init; some of which are due to configuration or >>>>>>> environment problems. This failure appears to be an internal failure; >>>>>>> here's some additional information (which may only be relevant to an >>>>>>> Open MPI developer): >>>>>>> >>>>>>> mca_base_open failed >>>>>>> --> Returned value -2 instead of OPAL_SUCCESS >>>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>> <InitFinalizeMain.java>_______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/