Hi

> I can't speak to the other issues, but for these - it looks like
> something isn't right in the system. Could be an incompatibility
> with Suse 12.1.
> 
> What the errors are saying is that malloc is failing when used at
> a very early stage in starting the process. Can you run even a
> C-based MPI "hello" program?

Yes. I have implemented more or less the same program in C and Java.

tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
Process 0 of 2 running on linpc0
Process 1 of 2 running on linpc1

Now 1 slave tasks are sending greetings.

Greetings from task 1:
  message type:        3
  msg length:          132 characters
  message:             
    hostname:          linpc1
    operating system:  Linux
    release:           3.1.10-1.16-desktop
    processor:         x86_64


tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java HelloMainWithBarrier
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
...


Thank you very much for any help in advance.

Kind regards

Siegmar



> On Dec 21, 2012, at 1:41 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> > The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> > 
> > linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> >  ompi_mpi_init: orte_init failed
> >  --> Returned "Out of resource" (-2) instead of "Success" (0)
> > --------------------------------------------------------------------------
> > *** An error occurred in MPI_Init
> > *** on a NULL communicator
> > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> > ***    and potentially your MPI job)
> > [(null):10586] Local abort before MPI_INIT completed successfully; not able 
to 
> > aggregate error messages, and not able to guarantee that all other 
> > processes 
> > were killed!
> > -------------------------------------------------------
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code.. Per user-direction, the job has been aborted.
> > -------------------------------------------------------
> > --------------------------------------------------------------------------
> > mpiexec detected that one or more processes exited with non-zero status, 
thus 
> > causing
> > the job to be terminated. The first process to do so was:
> > 
> >  Process name: [[16706,1],1]
> >  Exit code:    1
> > --------------------------------------------------------------------------
> > 
> > 
> > 
> > I use a valid environment on all machines. The problem occurs as well
> > when I compile and run the program directly on the Linux system.
> > 
> > linpc1 java 101 mpijavac BcastIntMain.java 
> > linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` 
BcastIntMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> 

Reply via email to