Interesting. My best guess is that the OMPI libraries aren't being found, 
though I'm a little surprised because the error message indicates an inability 
to malloc - but it's possible the message isn't accurate.

One thing stands out - I see you compiled your program with "javac". I suspect 
that is the source of the trouble - you really need to use the Java wrapper 
compiler "mpijavac" so all the libs get absorbed and/or linked correctly.


On Dec 21, 2012, at 9:46 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi
> 
>> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
>> appears to be in the Java side of things. For whatever reason, your
>> Java VM is refusing to allow a malloc to succeed. I suspect it has
>> something to do with its setup, but I'm not enough of a Java person
>> to point you to the problem.
>> 
>> Is it possible that the program was compiled against a different
>> (perhaps incompatible) version of Java?
> 
> No, I don't think so. A small Java program without MPI methods works.
> 
> linpc1 bin 122 which mpicc
> /usr/local/openmpi-1.9_64_cc/bin/mpicc
> linpc1 bin 123 pwd
> /usr/local/openmpi-1.9_64_cc/bin
> linpc1 bin 124 grep jdk *
> mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> linpc1 bin 125 which java
> /usr/local/jdk1.7.0_07-64/bin/java
> linpc1 bin 126 
> 
> 
> linpc1 prog 110 javac MiniProgMain.java
> linpc1 prog 111 java MiniProgMain
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> linpc1 prog 112 mpiexec java MiniProgMain
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> linpc1 prog 113 mpiexec -np 2 java MiniProgMain
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> Message 0
> Message 1
> Message 2
> Message 3
> Message 4
> 
> 
> A small program which allocates buffer for a new string.
> ...
> stringBUFLEN = new String (string.substring (0, len));
> ...
> 
> linpc1 prog 115 javac MemAllocMain.java 
> linpc1 prog 116 java MemAllocMain
> Type something ("quit" terminates program): ffghhfhh
> Received input:          ffghhfhh
> Converted to upper case: FFGHHFHH
> Type something ("quit" terminates program): quit
> Received input:          quit
> Converted to upper case: QUIT
> 
> linpc1 prog 117 mpiexec java MemAllocMain
> Type something ("quit" terminates program): fbhshnhjs
> Received input:          fbhshnhjs
> Converted to upper case: FBHSHNHJS
> Type something ("quit" terminates program): quit
> Received input:          quit
> Converted to upper case: QUIT
> linpc1 prog 118 
> 
> I'm not sure if this is of any help, but the problem starts with
> MPI methods. The following program calls just the Init() and
> Finalize() method.
> 
> tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  mca_base_open failed
>  --> Returned value -2 instead of OPAL_SUCCESS
> ...
> 
> 
> Hopefully somebody will have an idea what goes wrong on my Linux
> system. Thank you very much for any help in advance.
> 
> Kind regards
> 
> Siegmar
> 
> 
>> Just shooting in the dark here - I suspect you'll have to ask someone
>> more knowledgeable on JVMs.
>> 
>> 
>> On Dec 21, 2012, at 7:32 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>> 
>>> Hi
>>> 
>>>> I can't speak to the other issues, but for these - it looks like
>>>> something isn't right in the system. Could be an incompatibility
>>>> with Suse 12.1.
>>>> 
>>>> What the errors are saying is that malloc is failing when used at
>>>> a very early stage in starting the process. Can you run even a
>>>> C-based MPI "hello" program?
>>> 
>>> Yes. I have implemented more or less the same program in C and Java.
>>> 
>>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
>>> Process 0 of 2 running on linpc0
>>> Process 1 of 2 running on linpc1
>>> 
>>> Now 1 slave tasks are sending greetings.
>>> 
>>> Greetings from task 1:
>>> message type:        3
>>> msg length:          132 characters
>>> message:             
>>>   hostname:          linpc1
>>>   operating system:  Linux
>>>   release:           3.1.10-1.16-desktop
>>>   processor:         x86_64
>>> 
>>> 
>>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>> ...
>>> 
>>> 
>>> Thank you very much for any help in advance.
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> 
>>> 
>>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross 
>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>> 
>>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>>>>> 
>>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
>>>>> --------------------------------------------------------------------------
>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during opal_init; some of which are due to configuration or
>>>>> environment problems.  This failure appears to be an internal failure;
>>>>> here's some additional information (which may only be relevant to an
>>>>> Open MPI developer):
>>>>> 
>>>>> mca_base_open failed
>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>> ...
>>>>> ompi_mpi_init: orte_init failed
>>>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
>>>>> --------------------------------------------------------------------------
>>>>> *** An error occurred in MPI_Init
>>>>> *** on a NULL communicator
>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>> ***    and potentially your MPI job)
>>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not 
> able 
>>> to 
>>>>> aggregate error messages, and not able to guarantee that all other 
> processes 
>>>>> were killed!
>>>>> -------------------------------------------------------
>>>>> Primary job  terminated normally, but 1 process returned
>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>> -------------------------------------------------------
>>>>> --------------------------------------------------------------------------
>>>>> mpiexec detected that one or more processes exited with non-zero status, 
>>> thus 
>>>>> causing
>>>>> the job to be terminated. The first process to do so was:
>>>>> 
>>>>> Process name: [[16706,1],1]
>>>>> Exit code:    1
>>>>> --------------------------------------------------------------------------
>>>>> 
>>>>> 
>>>>> 
>>>>> I use a valid environment on all machines. The problem occurs as well
>>>>> when I compile and run the program directly on the Linux system.
>>>>> 
>>>>> linpc1 java 101 mpijavac BcastIntMain.java 
>>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` 
>>> BcastIntMain
>>>>> --------------------------------------------------------------------------
>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during opal_init; some of which are due to configuration or
>>>>> environment problems.  This failure appears to be an internal failure;
>>>>> here's some additional information (which may only be relevant to an
>>>>> Open MPI developer):
>>>>> 
>>>>> mca_base_open failed
>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> <InitFinalizeMain.java>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to