I can confirm that the first program fails (bcast a single int).

I'm trying to understand how the implementation works, but this may take some 
time (due to the holidays, etc.).


On Dec 22, 2012, at 2:53 AM, Siegmar Gross wrote:

> Hi
> 
>> Interesting. My best guess is that the OMPI libraries aren't being
>> found, though I'm a little surprised because the error message
>> indicates an inability to malloc - but it's possible the message
>> isn't accurate.
>> 
>> One thing stands out - I see you compiled your program with "javac".
>> I suspect that is the source of the trouble - you really need to use
>> the Java wrapper compiler "mpijavac" so all the libs get absorbed
>> and/or linked correctly.
> 
> No, I only compiled the first two programs (which don't use any MPI
> methods) with javac. The MPI program "InitFinalizeMain.java" was
> compiled with "mpijavac" (I use a script file and GNUmakefile).
> 
> linpc1 java 102 make_classfiles
> ...
> =========== linpc1 ===========
> Warning: untrusted X11 forwarding setup failed: xauth key data not generated
> Warning: No xauth data; using fake authentication data for X11 forwarding.
> mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles InitFinalizeMain.java
> ...
> 
> 
> The other programs work also if I compile them with "mpijavac"
> 
> linpc1 prog 107 mpijavac MemAllocMain.java 
> linpc1 prog 108 mpiexec java -cp `pwd` MemAllocMain
> Type something ("quit" terminates program): dgdas
> Received input:          dgdas
> Converted to upper case: DGDAS
> Type something ("quit" terminates program): quit
> Received input:          quit
> Converted to upper case: QUIT
> linpc1 prog 109 
> 
> 
> My environment should be valid as well. LD_LIBRARY_PATH contains
> first the directories for 32 bit libraries and then the directories
> for 64 bit libraries. I have split the long lines for the PATH
> variables so that they are easier to read.
> 
> linpc1 java 111 mpiexec java EnvironVarMain
> 
> Operating system: Linux    Processor architecture: x86_64
> 
>  CLASSPATH:
> /usr/local/junit4.10:
> /usr/local/junit4.10/junit-4.10.jar:
> //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dcore.jar:
> //usr/local/jdk1.7.0_07-64/j3d/lib/ext/j3dutils.jar:
> //usr/local/jdk1.7.0_07-64/j3d/lib/ext/vecmath.jar:
> /usr/local/javacc-5.0/javacc.jar:
> .:
> /home/fd1026/Linux/x86_64/mpi_classfiles
> 
>  LD_LIBRARY_PATH:
> /usr/lib:
> ...
> /usr/lib64:
> /usr/local/jdk1.7.0_07-64/jre/lib/amd64:
> /usr/local/gcc-4.7.1/lib64:
> /usr/local/gcc-4.7.1/libexec/gcc/x86_64-unknown-linux-gnu/4.7.1:
> /usr/local/gcc-4.7.1/lib/gcc/x86_64-unknown-linux-gnu/4.7.1:
> /usr/local/lib64:
> /usr/local/ssl/lib64:
> /usr/lib64:
> /usr/X11R6/lib64:
> /usr/local/openmpi-1.9_64_cc/lib64:
> /home/fd1026/Linux/x86_64/lib64
> linpc1 java 112 
> 
> Can I provide any other information to solve this problem?
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
>> On Dec 21, 2012, at 9:46 AM, Siegmar Gross 
>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>> 
>>> Hi
>>> 
>>>> Hmmm...weird. Well, it looks like OMPI itself is okay, so the issue
>>>> appears to be in the Java side of things. For whatever reason, your
>>>> Java VM is refusing to allow a malloc to succeed. I suspect it has
>>>> something to do with its setup, but I'm not enough of a Java person
>>>> to point you to the problem.
>>>> 
>>>> Is it possible that the program was compiled against a different
>>>> (perhaps incompatible) version of Java?
>>> 
>>> No, I don't think so. A small Java program without MPI methods works.
>>> 
>>> linpc1 bin 122 which mpicc
>>> /usr/local/openmpi-1.9_64_cc/bin/mpicc
>>> linpc1 bin 123 pwd
>>> /usr/local/openmpi-1.9_64_cc/bin
>>> linpc1 bin 124 grep jdk *
>>> mpijavac:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
>>> mpijavac.pl:my $my_compiler = "/usr/local/jdk1.7.0_07-64/bin/javac";
>>> linpc1 bin 125 which java
>>> /usr/local/jdk1.7.0_07-64/bin/java
>>> linpc1 bin 126 
>>> 
>>> 
>>> linpc1 prog 110 javac MiniProgMain.java
>>> linpc1 prog 111 java MiniProgMain
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> linpc1 prog 112 mpiexec java MiniProgMain
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> linpc1 prog 113 mpiexec -np 2 java MiniProgMain
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> Message 0
>>> Message 1
>>> Message 2
>>> Message 3
>>> Message 4
>>> 
>>> 
>>> A small program which allocates buffer for a new string.
>>> ...
>>> stringBUFLEN = new String (string.substring (0, len));
>>> ...
>>> 
>>> linpc1 prog 115 javac MemAllocMain.java 
>>> linpc1 prog 116 java MemAllocMain
>>> Type something ("quit" terminates program): ffghhfhh
>>> Received input:          ffghhfhh
>>> Converted to upper case: FFGHHFHH
>>> Type something ("quit" terminates program): quit
>>> Received input:          quit
>>> Converted to upper case: QUIT
>>> 
>>> linpc1 prog 117 mpiexec java MemAllocMain
>>> Type something ("quit" terminates program): fbhshnhjs
>>> Received input:          fbhshnhjs
>>> Converted to upper case: FBHSHNHJS
>>> Type something ("quit" terminates program): quit
>>> Received input:          quit
>>> Converted to upper case: QUIT
>>> linpc1 prog 118 
>>> 
>>> I'm not sure if this is of any help, but the problem starts with
>>> MPI methods. The following program calls just the Init() and
>>> Finalize() method.
>>> 
>>> tyr java 203 mpiexec -host linpc1 java InitFinalizeMain
>>> --------------------------------------------------------------------------
>>> It looks like opal_init failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during opal_init; some of which are due to configuration or
>>> environment problems.  This failure appears to be an internal failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>> 
>>> mca_base_open failed
>>> --> Returned value -2 instead of OPAL_SUCCESS
>>> ...
>>> 
>>> 
>>> Hopefully somebody will have an idea what goes wrong on my Linux
>>> system. Thank you very much for any help in advance.
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> 
>>>> Just shooting in the dark here - I suspect you'll have to ask someone
>>>> more knowledgeable on JVMs.
>>>> 
>>>> 
>>>> On Dec 21, 2012, at 7:32 AM, Siegmar Gross 
>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>> 
>>>>> Hi
>>>>> 
>>>>>> I can't speak to the other issues, but for these - it looks like
>>>>>> something isn't right in the system. Could be an incompatibility
>>>>>> with Suse 12.1.
>>>>>> 
>>>>>> What the errors are saying is that malloc is failing when used at
>>>>>> a very early stage in starting the process. Can you run even a
>>>>>> C-based MPI "hello" program?
>>>>> 
>>>>> Yes. I have implemented more or less the same program in C and Java.
>>>>> 
>>>>> tyr hello_1 131 mpiexec -np 2 -host linpc0,linpc1 hello_1_mpi
>>>>> Process 0 of 2 running on linpc0
>>>>> Process 1 of 2 running on linpc1
>>>>> 
>>>>> Now 1 slave tasks are sending greetings.
>>>>> 
>>>>> Greetings from task 1:
>>>>> message type:        3
>>>>> msg length:          132 characters
>>>>> message:             
>>>>>  hostname:          linpc1
>>>>>  operating system:  Linux
>>>>>  release:           3.1.10-1.16-desktop
>>>>>  processor:         x86_64
>>>>> 
>>>>> 
>>>>> tyr hello_1 132 mpiexec -np 2 -host linpc0,linpc1 java 
>>>>> HelloMainWithBarrier
>>>>> --------------------------------------------------------------------------
>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during opal_init; some of which are due to configuration or
>>>>> environment problems.  This failure appears to be an internal failure;
>>>>> here's some additional information (which may only be relevant to an
>>>>> Open MPI developer):
>>>>> 
>>>>> mca_base_open failed
>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>> ...
>>>>> 
>>>>> 
>>>>> Thank you very much for any help in advance.
>>>>> 
>>>>> Kind regards
>>>>> 
>>>>> Siegmar
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Dec 21, 2012, at 1:41 AM, Siegmar Gross 
>>>>> <siegmar.gr...@informatik.hs-fulda.de> wrote:
>>>>>> 
>>>>>>> The program breaks if I use two Linux.x86_64 machines (Open Suse 12.1).
>>>>>>> 
>>>>>>> linpc1 etc 101 mpiexec -np 2 -host linpc0,linpc1 java BcastIntArrayMain
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>>>> fail during opal_init; some of which are due to configuration or
>>>>>>> environment problems.  This failure appears to be an internal failure;
>>>>>>> here's some additional information (which may only be relevant to an
>>>>>>> Open MPI developer):
>>>>>>> 
>>>>>>> mca_base_open failed
>>>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>>>> ...
>>>>>>> ompi_mpi_init: orte_init failed
>>>>>>> --> Returned "Out of resource" (-2) instead of "Success" (0)
>>>>>>> --------------------------------------------------------------------------
>>>>>>> *** An error occurred in MPI_Init
>>>>>>> *** on a NULL communicator
>>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>>>>>> ***    and potentially your MPI job)
>>>>>>> [(null):10586] Local abort before MPI_INIT completed successfully; not 
>>> able 
>>>>> to 
>>>>>>> aggregate error messages, and not able to guarantee that all other 
>>> processes 
>>>>>>> were killed!
>>>>>>> -------------------------------------------------------
>>>>>>> Primary job  terminated normally, but 1 process returned
>>>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>>>> -------------------------------------------------------
>>>>>>> --------------------------------------------------------------------------
>>>>>>> mpiexec detected that one or more processes exited with non-zero 
>>>>>>> status, 
>>>>> thus 
>>>>>>> causing
>>>>>>> the job to be terminated. The first process to do so was:
>>>>>>> 
>>>>>>> Process name: [[16706,1],1]
>>>>>>> Exit code:    1
>>>>>>> --------------------------------------------------------------------------
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I use a valid environment on all machines. The problem occurs as well
>>>>>>> when I compile and run the program directly on the Linux system.
>>>>>>> 
>>>>>>> linpc1 java 101 mpijavac BcastIntMain.java 
>>>>>>> linpc1 java 102 mpiexec -np 2 -host linpc0,linpc1 java -cp `pwd` 
>>>>> BcastIntMain
>>>>>>> --------------------------------------------------------------------------
>>>>>>> It looks like opal_init failed for some reason; your parallel process is
>>>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>>>> fail during opal_init; some of which are due to configuration or
>>>>>>> environment problems.  This failure appears to be an internal failure;
>>>>>>> here's some additional information (which may only be relevant to an
>>>>>>> Open MPI developer):
>>>>>>> 
>>>>>>> mca_base_open failed
>>>>>>> --> Returned value -2 instead of OPAL_SUCCESS
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>> <InitFinalizeMain.java>_______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to