Hi > Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue > appears to be in the Java side of things. For whatever reason, your > Java VM is refusing to allow a malloc to succeed. I suspect it has > something to do with its setup, but I'm not enough of a Java person > to point you to the problem. > > Is it possible that the program was compiled against a different > (perhaps incompatible) version of Java?
No, I don't think so. A small Java program without MPI methods works. linpc1 bin 122 which mpicc /usr/local/openmpi-1.9_64_cc/bin/mpicc linpc1 bin 123 pwd /usr/local/openmpi-1.9_64_cc/bin linpc1 bin 124 grep jdk * mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac"; linpc1 bin 125 which java /usr/local/jdk1.7.0_07-64/bin/java linpc1 bin 126 linpc1 prog 110 javac MiniProgMain.java linpc1 prog 111 java MiniProgMain Message 0 Message 1 Message 2 Message 3 Message 4 linpc1 prog 112 mpiexec java MiniProgMain Message 0 Message 1 Message 2 Message 3 Message 4 linpc1 prog 113 mpiexec -np 2 java MiniProgMain Message 0 Message 1 Message 2 Message 3 Message 4 Message 0 Message 1 Message 2 Message 3 Message 4 A small program which allocates buffer for a new string. ... stringBUFLEN = new String (string.substring (0, len)); ... linpc1 prog 115 javac MemAllocMain.java linpc1 prog 116 java MemAllocMain Type something ("quit" terminates program): ffghhfhh Received input: ffghhfhh Converted to upper case: FFGHHFHH Type something ("quit" terminates program): quit Received input: quit Converted to upper case: QUIT linpc1 prog 117 mpiexec java MemAllocMain Type something ("quit" terminates program): fbhshnhjs Received input: fbhshnhjs Converted to upper case: FBHSHNHJS Type something ("quit" terminates program): quit Received input: quit Converted to upper case: QUIT linpc1 prog 118 I'm not sure if this is of any help, but the problem starts with MPI methods. The following program calls just the Init() and Finalize() method. tyr java 203 mpiexec -host linpc1 java InitFinalizeMain -------------------------------------------------------------------------- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_base_open failed --> Returned value -2 instead of OPAL_SUCCESS ... Hopefully somebody will have an idea what goes wrong on my Linux system. Thank you very much for any help in advance. Kind regards Siegmar > Just shooting in the dark here - I suspect you'll have to ask someone > more knowledgeable on JVMs. > > > On Dec 21, 2012, at 7:32 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > > > Hi > > > >> I can't speak to the other issues, but for these - it looks like > >> something isn't right in the system. Could be an incompatibility > >> with Suse 12.1. > >> > >> What the errors are saying is that malloc is failing when used at > >> a very early stage in starting the process. Can you run even a > >> C-based MPI "hello" program? > > > > Yes. I have implemented more or less the same program in C and Java. > > > > tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi > > Process 0 of 2 running on linpc0 > > Process 1 of 2 running on linpc1 > > > > Now 1 slave tasks are sending greetings. > > > > Greetings from task 1: > > message type: 3 > > msg length: 132 characters > > message: > > hostname: linpc1 > > operating system: Linux > > release: 3.1.10-1.16-desktop > > processor: x86_64 > > > > > > tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier > > -------------------------------------------------------------------------- > > It looks like opal_init failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during opal_init; some of which are due to configuration or > > environment problems. This failure appears to be an internal failure; > > here's some additional information (which may only be relevant to an > > Open MPI developer): > > > > mca_base_open failed > > --> Returned value -2 instead of OPAL_SUCCESS > > ... > > > > > > Thank you very much for any help in advance. > > > > Kind regards > > > > Siegmar > > > > > > > >> On Dec 21, 2012, at 1:41 AM, Siegmar Gross > > <siegmar.gr...@informatik.hs-fulda.de> wrote: > >> > >>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1). > >>> > >>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain > >>> -------------------------------------------------------------------------- > >>> It looks like opal_init failed for some reason; your parallel process is > >>> likely to abort. There are many reasons that a parallel process can > >>> fail during opal_init; some of which are due to configuration or > >>> environment problems. This failure appears to be an internal failure; > >>> here's some additional information (which may only be relevant to an > >>> Open MPI developer): > >>> > >>> mca_base_open failed > >>> --> Returned value -2 instead of OPAL_SUCCESS > >>> ... > >>> ompi_mpi_init: orte_init failed > >>> --> Returned "Out of resource" (-2) instead of "Success" (0) > >>> -------------------------------------------------------------------------- > >>> *** An error occurred in MPI_Init > >>> *** on a NULL communicator > >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > >>> *** and potentially your MPI job) > >>> [(null):10586] Local abort before MPI_INIT completed successfully; not able > > to > >>> aggregate error messages, and not able to guarantee that all other processes > >>> were killed! > >>> ------------------------------------------------------- > >>> Primary job terminated normally, but 1 process returned > >>> a non-zero exit code.. Per user-direction, the job has been aborted. > >>> ------------------------------------------------------- > >>> -------------------------------------------------------------------------- > >>> mpiexec detected that one or more processes exited with non-zero status, > > thus > >>> causing > >>> the job to be terminated. The first process to do so was: > >>> > >>> Process name: [[16706,1],1] > >>> Exit code: 1 > >>> -------------------------------------------------------------------------- > >>> > >>> > >>> > >>> I use a valid environment on all machines. The problem occurs as well > >>> when I compile and run the program directly on the Linux system. > >>> > >>> linpc1 java 101 mpijavac BcastIntMain.java > >>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` > > BcastIntMain > >>> -------------------------------------------------------------------------- > >>> It looks like opal_init failed for some reason; your parallel process is > >>> likely to abort. There are many reasons that a parallel process can > >>> fail during opal_init; some of which are due to configuration or > >>> environment problems. This failure appears to be an internal failure; > >>> here's some additional information (which may only be relevant to an > >>> Open MPI developer): > >>> > >>> mca_base_open failed > >>> --> Returned value -2 instead of OPAL_SUCCESS > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
InitFinalizeMain.java
Description: InitFinalizeMain.java