I can't speak to the other issues, but for these - it looks like something 
isn't right in the system. Could be an incompatibility with Suse 12.1.

What the errors are saying is that malloc is failing when used at a very early 
stage in starting the process. Can you run even a C-based MPI "hello" program?


On Dec 21, 2012, at 1:41 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> 
> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  mca_base_open failed
>  --> Returned value -2 instead of OPAL_SUCCESS
> ...
>  ompi_mpi_init: orte_init failed
>  --> Returned "Out of resource" (-2) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***    and potentially your MPI job)
> [(null):10586] Local abort before MPI_INIT completed successfully; not able 
> to 
> aggregate error messages, and not able to guarantee that all other processes 
> were killed!
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpiexec detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[16706,1],1]
>  Exit code:    1
> --------------------------------------------------------------------------
> 
> 
> 
> I use a valid environment on all machines. The problem occurs as well
> when I compile and run the program directly on the Linux system.
> 
> linpc1 java 101 mpijavac BcastIntMain.java 
> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` BcastIntMain
> --------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>  mca_base_open failed
>  --> Returned value -2 instead of OPAL_SUCCESS

Reply via email to