Hi

> Interesting. My best guess is that the OMPI libraries aren't being
> found, though I'm a little surprised because the error message
> indicates an inability to malloc - but it's possible the message
> isn't accurate.
> 
> One thing stands out - I see you compiled your program with "javac".
> I suspect that is the source of the trouble - you really need to use
> the Java wrapper compiler "mpijavac" so all the libs get absorbed
> and/or linked correctly.

No, I only compiled the first two programs (which don't use any MPI
methods) with javac. The MPI program "InitFinalizeMain.java" was
compiled with "mpijavac" (I use a script file and GNUmakefile).

linpc1 java 102 make_classfiles
...
=========== linpc1 ===========
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles InitFinalizeMain.java
...


The other programs work also if I compile them with "mpijavac"

linpc1 prog 107 mpijavac MemAllocMain.java 
linpc1 prog 108 mpiexec java -cp `pwd` MemAllocMain
Type something ("quit" terminates program): dgdas
Received input:          dgdas
Converted to upper case: DGDAS
Type something ("quit" terminates program): quit
Received input:          quit
Converted to upper case: QUIT
linpc1 prog 109 


My environment should be valid as well. LD_LIBRARY_PATH contains
first the directories for 32 bit libraries and then the directories
for 64 bit libraries. I have split the long lines for the PATH
variables so that they are easier to read.

linpc1 java 111 mpiexec java EnvironVarMain

Operating system: Linux    Processor architecture: x86_64

  CLASSPATH:
/usr/local/junit4.10:
/usr/local/junit4.10/junit-4.10.jar:
//usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar:
//usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar:
//usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar:
/usr/local/javacc-5.0/javacc.jar:
.:
/home/fd1026/Linux/x86_64/mpi_classfiles

  LD_LIBRARY_PATH:
/usr/lib:
...
/usr/lib64:
/usr/local/jdk1.7.0_07-64/jre/lib/amd64:
/usr/local/gcc-4.7.1/lib64:
/usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1:
/usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1:
/usr/local/lib64:
/usr/local/ssl/lib64:
/usr/lib64:
/usr/X11R6/lib64:
/usr/local/openmpi-1.9_64_cc/lib64:
/home/fd1026/Linux/x86_64/lib64
linpc1 java 112 

Can I provide any other information to solve this problem?


Kind regards

Siegmar


> On Dec 21, 2012, at 9:46 AM, Siegmar Gross 
> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> 
> > Hi
> > 
> >> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
> >> appears to be in the Java side of things. For whatever reason, your
> >> Java VM is refusing to allow a malloc to succeed. I suspect it has
> >> something to do with its setup, but I'm not enough of a Java person
> >> to point you to the problem.
> >> 
> >> Is it possible that the program was compiled against a different
> >> (perhaps incompatible) version of Java?
> > 
> > No, I don't think so. A small Java program without MPI methods works.
> > 
> > linpc1 bin 122 which mpicc
> > /usr/local/openmpi-1.9_64_cc/bin/mpicc
> > linpc1 bin 123 pwd
> > /usr/local/openmpi-1.9_64_cc/bin
> > linpc1 bin 124 grep jdk *
> > mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> > mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
> > linpc1 bin 125 which java
> > /usr/local/jdk1.7.0_07-64/bin/java
> > linpc1 bin 126 
> > 
> > 
> > linpc1 prog 110 javac MiniProgMain.java
> > linpc1 prog 111 java MiniProgMain
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > linpc1 prog 112 mpiexec java MiniProgMain
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > linpc1 prog 113 mpiexec -np 2 java MiniProgMain
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > Message 0
> > Message 1
> > Message 2
> > Message 3
> > Message 4
> > 
> > 
> > A small program which allocates buffer for a new string.
> > ...
> > stringBUFLEN = new String (string.substring (0, len));
> > ...
> > 
> > linpc1 prog 115 javac MemAllocMain.java 
> > linpc1 prog 116 java MemAllocMain
> > Type something ("quit" terminates program): ffghhfhh
> > Received input:          ffghhfhh
> > Converted to upper case: FFGHHFHH
> > Type something ("quit" terminates program): quit
> > Received input:          quit
> > Converted to upper case: QUIT
> > 
> > linpc1 prog 117 mpiexec java MemAllocMain
> > Type something ("quit" terminates program): fbhshnhjs
> > Received input:          fbhshnhjs
> > Converted to upper case: FBHSHNHJS
> > Type something ("quit" terminates program): quit
> > Received input:          quit
> > Converted to upper case: QUIT
> > linpc1 prog 118 
> > 
> > I'm not sure if this is of any help, but the problem starts with
> > MPI methods. The following program calls just the Init() and
> > Finalize() method.
> > 
> > tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  mca_base_open failed
> >  --> Returned value -2 instead of OPAL_SUCCESS
> > ...
> > 
> > 
> > Hopefully somebody will have an idea what goes wrong on my Linux
> > system. Thank you very much for any help in advance.
> > 
> > Kind regards
> > 
> > Siegmar
> > 
> > 
> >> Just shooting in the dark here - I suspect you'll have to ask someone
> >> more knowledgeable on JVMs.
> >> 
> >> 
> >> On Dec 21, 2012, at 7:32 AM, Siegmar Gross 
> > <siegmar.gr...@informatik.hs-fulda.de> wrote:
> >> 
> >>> Hi
> >>> 
> >>>> I can't speak to the other issues, but for these - it looks like
> >>>> something isn't right in the system. Could be an incompatibility
> >>>> with Suse 12.1.
> >>>> 
> >>>> What the errors are saying is that malloc is failing when used at
> >>>> a very early stage in starting the process. Can you run even a
> >>>> C-based MPI "hello" program?
> >>> 
> >>> Yes. I have implemented more or less the same program in C and Java.
> >>> 
> >>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
> >>> Process 0 of 2 running on linpc0
> >>> Process 1 of 2 running on linpc1
> >>> 
> >>> Now 1 slave tasks are sending greetings.
> >>> 
> >>> Greetings from task 1:
> >>> message type:        3
> >>> msg length:          132 characters
> >>> message:             
> >>>   hostname:          linpc1
> >>>   operating system:  Linux
> >>>   release:           3.1.10-1.16-desktop
> >>>   processor:         x86_64
> >>> 
> >>> 
> >>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java 
> >>> HelloMainWithBarrier
> >>> --------------------------------------------------------------------------
> >>> It looks like opal_init failed for some reason; your parallel process is
> >>> likely to abort.  There are many reasons that a parallel process can
> >>> fail during opal_init; some of which are due to configuration or
> >>> environment problems.  This failure appears to be an internal failure;
> >>> here's some additional information (which may only be relevant to an
> >>> Open MPI developer):
> >>> 
> >>> mca_base_open failed
> >>> --> Returned value -2 instead of OPAL_SUCCESS
> >>> ...
> >>> 
> >>> 
> >>> Thank you very much for any help in advance.
> >>> 
> >>> Kind regards
> >>> 
> >>> Siegmar
> >>> 
> >>> 
> >>> 
> >>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross 
> >>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
> >>>> 
> >>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
> >>>>> 
> >>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
> >>>>> --------------------------------------------------------------------------
> >>>>> It looks like opal_init failed for some reason; your parallel process is
> >>>>> likely to abort.  There are many reasons that a parallel process can
> >>>>> fail during opal_init; some of which are due to configuration or
> >>>>> environment problems.  This failure appears to be an internal failure;
> >>>>> here's some additional information (which may only be relevant to an
> >>>>> Open MPI developer):
> >>>>> 
> >>>>> mca_base_open failed
> >>>>> --> Returned value -2 instead of OPAL_SUCCESS
> >>>>> ...
> >>>>> ompi_mpi_init: orte_init failed
> >>>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
> >>>>> --------------------------------------------------------------------------
> >>>>> *** An error occurred in MPI_Init
> >>>>> *** on a NULL communicator
> >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> >>>>> ***    and potentially your MPI job)
> >>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not 
> > able 
> >>> to 
> >>>>> aggregate error messages, and not able to guarantee that all other 
> > processes 
> >>>>> were killed!
> >>>>> -------------------------------------------------------
> >>>>> Primary job  terminated normally, but 1 process returned
> >>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
> >>>>> -------------------------------------------------------
> >>>>> --------------------------------------------------------------------------
> >>>>> mpiexec detected that one or more processes exited with non-zero 
> >>>>> status, 
> >>> thus 
> >>>>> causing
> >>>>> the job to be terminated. The first process to do so was:
> >>>>> 
> >>>>> Process name: [[16706,1],1]
> >>>>> Exit code:    1
> >>>>> --------------------------------------------------------------------------
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> I use a valid environment on all machines. The problem occurs as well
> >>>>> when I compile and run the program directly on the Linux system.
> >>>>> 
> >>>>> linpc1 java 101 mpijavac BcastIntMain.java 
> >>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` 
> >>> BcastIntMain
> >>>>> --------------------------------------------------------------------------
> >>>>> It looks like opal_init failed for some reason; your parallel process is
> >>>>> likely to abort.  There are many reasons that a parallel process can
> >>>>> fail during opal_init; some of which are due to configuration or
> >>>>> environment problems.  This failure appears to be an internal failure;
> >>>>> here's some additional information (which may only be relevant to an
> >>>>> Open MPI developer):
> >>>>> 
> >>>>> mca_base_open failed
> >>>>> --> Returned value -2 instead of OPAL_SUCCESS
> >>>> 
> >>> 
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> 
> >> 
> > <InitFinalizeMain.java>_______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 

Reply via email to