Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue appears to 
be in the Java side of things. For whatever reason, your Java VM is refusing to 
allow a malloc to succeed. I suspect it has something to do with its setup, but 
I'm not enough of a Java person to point you to the problem.

Is it possible that the program was compiled against a different (perhaps 
incompatible) version of Java?

Just shooting in the dark here - I suspect you'll have to ask someone more 
knowledgeable on JVMs.


On Dec 21, 2012, at 7:32 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi
> 
>> I can't speak to the other issues, but for these - it looks like
>> something isn't right in the system. Could be an incompatibility
>> with Suse 12.1.
>> 
>> What the errors are saying is that malloc is failing when used at
>> a very early stage in starting the process. Can you run even a
>> C-based MPI "hello" program?
> 
> Yes. I have implemented more or less the same program in C and Java.
> 
> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
> Process 0 of 2 running on linpc0
> Process 1 of 2 running on linpc1
> 
> Now 1 slave tasks are sending greetings.
> 
> Greetings from task 1:
>  message type:        3
>  msg length:          132 characters
>  message:             
>    hostname:          linpc1
>    operating system:  Linux
>    release:           3.1.10-1.16-desktop
>    processor:         x86_64
> 
> 
> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  mca_base_open failed
>  --> Returned value -2 instead of OPAL_SUCCESS
> ...
> 
> 
> Thank you very much for any help in advance.
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>> 
>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>>> 
>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>> ...
>>> ompi_mpi_init: orte_init failed
>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>> *** An error occurred in MPI_Init
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***    and potentially your MPI job)
>>> [(null):10586] Local abort before MPI_INIT completed successfully; not able 
> to 
>>> aggregate error messages, and not able to guarantee that all other 
>>> processes 
>>> were killed!
>>> -------------------------------------------------------
>>> Primary job  terminated normally, but 1 process returned
>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>> -------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> mpiexec detected that one or more processes exited with non-zero status, 
> thus 
>>> causing
>>> the job to be terminated. The first process to do so was:
>>> 
>>> Process name: [[16706,1],1]
>>> Exit code:    1
>>> --------------------------------------------------------------------------
>>> 
>>> 
>>> 
>>> I use a valid environment on all machines. The problem occurs as well
>>> when I compile and run the program directly on the Linux system.
>>> 
>>> linpc1 java 101 mpijavac BcastIntMain.java 
>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` 
> BcastIntMain
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to