[OMPI users] segfault on java binding from MPI.init()

2015-08-03 Thread Nate Chambers
We've been struggling with this error for a while, so hoping someone more
knowledgeable can help!

Our java MPI code exits with a segfault during its normal operation, *but
the segfault occurs before our code ever uses MPI functionality like
sending/receiving. *We've removed all message calls and any use of
MPI.COMM_WORLD from the code. The segfault occurs if we call MPI.init(args)
in our code, and does not if we comment that line out. Further vexing us,
the crash doesn't happen at the point of the MPI.init call, but later on in
the program. I don't have an easy-to-run example here because our non-MPI
code is so large and complicated. We have run simpler test programs with
MPI and the segfault does not occur.

We have isolated the line where the segfault occurs. However, if we comment
that out, the program will run longer, but then randomly (but
deterministically) segfault later on in the code. Does anyone have tips on
how to debug this? We have tried several flags with mpirun, but no good
clues.

We have also tried several MPI versions, including stable 1.8.7 and the
most recent 1.8.8rc1


ATTACHED
- config.log from installation
- output from `ompi_info -all`


OUTPUT FROM RUNNING

> mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
...
some normal output from our code
...
--
mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited on
signal 11 (Segmentation fault).
--


config.log.bz2
Description: BZip2 compressed data


ompi_info.txt.bz2
Description: BZip2 compressed data


Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-03 Thread Nate Chambers
Gilles,
Yes I saw that github thread, but wasn't certain this was the same issue.
Very possible that it is. Oddly enough, that github code doesn't crash for
us.

Adding a sleep call doesn't help. It's actually now crashing on the
MPI.init(args) call itself, and the JVM is reporting the error. Earlier it
would get past this point. I'm not certain why this has changed all of a
sudden. We did change a bit in our unrelated java code...

Below is the output. It does match more closely to that previous report.


Nate

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x2b00ad2807cf, pid=28537, tid=47281916847872
#
# JRE version: 7.0_21-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x57c7cf]  jni_GetStringUTFChars+0x9f
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /gpfs/home/nchamber/hs_err_pid28537.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x2b198c15b7cf, pid=28538, tid=47388736182016
#
# JRE version: 7.0_21-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x57c7cf]#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
  jni_GetStringUTFChars+0x9f
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /gpfs/home/nchamber/hs_err_pid28538.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
--
mpirun noticed that process rank 0 with PID 28537 on node r3n70 exited on
signal 6 (Aborted).
--










On Mon, Aug 3, 2015 at 2:47 PM, Gilles Gouaillardet 
wrote:

> Nate,
>
> a similar issue has already been reported at
> https://github.com/open-mpi/ompi/issues/369, but we have
> not yet been able to figure out what is going wrong.
>
> right after MPI_Init(), can you add
> Thread.sleep(5000);
> and see if it helps ?
>
> Cheers,
>
> Gilles
>
>
> On 8/4/2015 8:36 AM, Nate Chambers wrote:
>
> We've been struggling with this error for a while, so hoping someone more
> knowledgeable can help!
>
> Our java MPI code exits with a segfault during its normal operation, *but
> the segfault occurs before our code ever uses MPI functionality like
> sending/receiving. *We've removed all message calls and any use of
> MPI.COMM_WORLD from the code. The segfault occurs if we call MPI.init(args)
> in our code, and does not if we comment that line out. Further vexing us,
> the crash doesn't happen at the point of the MPI.init call, but later on in
> the program. I don't have an easy-to-run example here because our non-MPI
> code is so large and complicated. We have run simpler test programs with
> MPI and the segfault does not occur.
>
> We have isolated the line where the segfault occurs. However, if we
> comment that out, the program will run longer, but then randomly (but
> deterministically) segfault later on in the code. Does anyone have tips on
> how to debug this? We have tried several flags with mpirun, but no good
> clues.
>
> We have also tried several MPI versions, including stable 1.8.7 and the
> most recent 1.8.8rc1
>
>
> ATTACHED
> - config.log from installation
> - output from `ompi_info -all`
>
>
> OUTPUT FROM RUNNING
>
> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
> ...
> some normal output from our code
> ...
> --
> mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited on
> signal 11 (Segmentation fault).
> --
>
>
>
>
>
> ___
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27387.php
>


Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-04 Thread Nate Chambers
Sure, I reran the configure with CC=gcc and then make install. I think
that's the proper way to do it. Attached is my config log. The behavior
when running our code appears to be the same. The output is the same error
I pasted in my email above. It occurs when calling MPI.init().

I'm not great at debugging this sort of stuff, but happy to try things out
if you need me to.

Nate


On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard 
wrote:

> Hello Nate,
>
> As a first step to addressing this, could you please try using gcc rather
> than the Intel compilers to build Open MPI?
>
> We've been doing a lot of work recently on the java bindings, etc. but
> have never tried using any compilers other
> than gcc when working with the java bindings.
>
> Thanks,
>
> Howard
>
>
> 2015-08-03 17:36 GMT-06:00 Nate Chambers :
>
>> We've been struggling with this error for a while, so hoping someone more
>> knowledgeable can help!
>>
>> Our java MPI code exits with a segfault during its normal operation, *but
>> the segfault occurs before our code ever uses MPI functionality like
>> sending/receiving. *We've removed all message calls and any use of
>> MPI.COMM_WORLD from the code. The segfault occurs if we call MPI.init(args)
>> in our code, and does not if we comment that line out. Further vexing us,
>> the crash doesn't happen at the point of the MPI.init call, but later on in
>> the program. I don't have an easy-to-run example here because our non-MPI
>> code is so large and complicated. We have run simpler test programs with
>> MPI and the segfault does not occur.
>>
>> We have isolated the line where the segfault occurs. However, if we
>> comment that out, the program will run longer, but then randomly (but
>> deterministically) segfault later on in the code. Does anyone have tips on
>> how to debug this? We have tried several flags with mpirun, but no good
>> clues.
>>
>> We have also tried several MPI versions, including stable 1.8.7 and the
>> most recent 1.8.8rc1
>>
>>
>> ATTACHED
>> - config.log from installation
>> - output from `ompi_info -all`
>>
>>
>> OUTPUT FROM RUNNING
>>
>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>> ...
>> some normal output from our code
>> ...
>> --
>> mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited on
>> signal 11 (Segmentation fault).
>> --
>>
>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27389.php
>


config.log.bz2
Description: BZip2 compressed data


Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-04 Thread Nate Chambers
Sanity checks pass. Both Hello and Ring.java run correctly with the
expected program's output.

Does MPI.init(args) expect anything from those command-line args?


Nate


On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard 
wrote:

> Hello Nate,
>
> As a sanity check of your installation, could you try to compile the
> examples/*.java codes using the mpijavac you've installed and see that
> those run correctly?
> I'd be just interested in the Hello.java and Ring.java?
>
> Howard
>
>
>
>
>
>
>
> 2015-08-04 14:34 GMT-06:00 Nate Chambers :
>
>> Sure, I reran the configure with CC=gcc and then make install. I think
>> that's the proper way to do it. Attached is my config log. The behavior
>> when running our code appears to be the same. The output is the same error
>> I pasted in my email above. It occurs when calling MPI.init().
>>
>> I'm not great at debugging this sort of stuff, but happy to try things
>> out if you need me to.
>>
>> Nate
>>
>>
>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard 
>> wrote:
>>
>>> Hello Nate,
>>>
>>> As a first step to addressing this, could you please try using gcc
>>> rather than the Intel compilers to build Open MPI?
>>>
>>> We've been doing a lot of work recently on the java bindings, etc. but
>>> have never tried using any compilers other
>>> than gcc when working with the java bindings.
>>>
>>> Thanks,
>>>
>>> Howard
>>>
>>>
>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers :
>>>
>>>> We've been struggling with this error for a while, so hoping someone
>>>> more knowledgeable can help!
>>>>
>>>> Our java MPI code exits with a segfault during its normal operation, *but
>>>> the segfault occurs before our code ever uses MPI functionality like
>>>> sending/receiving. *We've removed all message calls and any use of
>>>> MPI.COMM_WORLD from the code. The segfault occurs if we call MPI.init(args)
>>>> in our code, and does not if we comment that line out. Further vexing us,
>>>> the crash doesn't happen at the point of the MPI.init call, but later on in
>>>> the program. I don't have an easy-to-run example here because our non-MPI
>>>> code is so large and complicated. We have run simpler test programs with
>>>> MPI and the segfault does not occur.
>>>>
>>>> We have isolated the line where the segfault occurs. However, if we
>>>> comment that out, the program will run longer, but then randomly (but
>>>> deterministically) segfault later on in the code. Does anyone have tips on
>>>> how to debug this? We have tried several flags with mpirun, but no good
>>>> clues.
>>>>
>>>> We have also tried several MPI versions, including stable 1.8.7 and the
>>>> most recent 1.8.8rc1
>>>>
>>>>
>>>> ATTACHED
>>>> - config.log from installation
>>>> - output from `ompi_info -all`
>>>>
>>>>
>>>> OUTPUT FROM RUNNING
>>>>
>>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>>>> ...
>>>> some normal output from our code
>>>> ...
>>>>
>>>> --
>>>> mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited
>>>> on signal 11 (Segmentation fault).
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>>
>>>> ___
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>>>>
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php
>>>
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php
>>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27392.php
>


Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-05 Thread Nate Chambers
Howard,

Thanks for looking at all this. Adding System.gc() did not cause it to
segfault. The segfault still comes much later in the processing.

I was able to reduce my code to a single test file without other
dependencies. It is attached. This code simply opens a text file and reads
its lines, one by one. Once finished, it closes and opens the same file and
reads the lines again. On my system, it does this about 4 times until the
segfault fires. Obviously this code makes no sense, but it's based on our
actual code that reads millions of lines of data and does various
processing to it.

Attached is a tweets.tgz file that you can uncompress to have an input
directory. The text file is just the same line over and over again. Run it
as:

*java MPITestBroke tweets/*


Nate





On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard 
wrote:

> Hi Nate,
>
> Sorry for the delay in getting back.  Thanks for the sanity check.  You
> may have a point about the args string to MPI.init -
> there's nothing the Open MPI is needing from this but that is a difference
> with your use case - your app has an argument.
>
> Would you mind adding a
>
> System.gc()
>
> call immediately after MPI.init call and see if the gc blows up with a
> segfault?
>
> Also, may be interesting to add the -verbose:jni to your command line.
>
> We'll do some experiments here with the init string arg.
>
> Is your app open source where we could download it and try to reproduce
> the problem locally?
>
> thanks,
>
> Howard
>
>
> 2015-08-04 18:52 GMT-06:00 Nate Chambers :
>
>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>> expected program's output.
>>
>> Does MPI.init(args) expect anything from those command-line args?
>>
>>
>> Nate
>>
>>
>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard 
>> wrote:
>>
>>> Hello Nate,
>>>
>>> As a sanity check of your installation, could you try to compile the
>>> examples/*.java codes using the mpijavac you've installed and see that
>>> those run correctly?
>>> I'd be just interested in the Hello.java and Ring.java?
>>>
>>> Howard
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers :
>>>
>>>> Sure, I reran the configure with CC=gcc and then make install. I think
>>>> that's the proper way to do it. Attached is my config log. The behavior
>>>> when running our code appears to be the same. The output is the same error
>>>> I pasted in my email above. It occurs when calling MPI.init().
>>>>
>>>> I'm not great at debugging this sort of stuff, but happy to try things
>>>> out if you need me to.
>>>>
>>>> Nate
>>>>
>>>>
>>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard 
>>>> wrote:
>>>>
>>>>> Hello Nate,
>>>>>
>>>>> As a first step to addressing this, could you please try using gcc
>>>>> rather than the Intel compilers to build Open MPI?
>>>>>
>>>>> We've been doing a lot of work recently on the java bindings, etc. but
>>>>> have never tried using any compilers other
>>>>> than gcc when working with the java bindings.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Howard
>>>>>
>>>>>
>>>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers :
>>>>>
>>>>>> We've been struggling with this error for a while, so hoping someone
>>>>>> more knowledgeable can help!
>>>>>>
>>>>>> Our java MPI code exits with a segfault during its normal operation, *but
>>>>>> the segfault occurs before our code ever uses MPI functionality like
>>>>>> sending/receiving. *We've removed all message calls and any use of
>>>>>> MPI.COMM_WORLD from the code. The segfault occurs if we call 
>>>>>> MPI.init(args)
>>>>>> in our code, and does not if we comment that line out. Further vexing us,
>>>>>> the crash doesn't happen at the point of the MPI.init call, but later on 
>>>>>> in
>>>>>> the program. I don't have an easy-to-run example here because our non-MPI
>>>>>> code is so large and complicated. We have r

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-06 Thread Nate Chambers
Howard,

I tried the nightly build openmpi-dev-2223-g731cfe3 and it still segfaults
as before. I must admit I am new to MPI, so is it possible I'm just
configuring or running incorrectly? Let me list my steps for you, and maybe
something will jump out? Also attached is my config.log.


CONFIGURE
./configure --prefix= --enable-mpi-java CC=gcc

MAKE
make all install

RUN
/mpirun -np 1 java MPITestBroke twitter/


DEFAULT JAVA AND GCC

$ java -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

$ gcc --v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
--build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)





On Thu, Aug 6, 2015 at 7:58 AM, Howard Pritchard 
wrote:

> HI Nate,
>
> We're trying this out on a mac running mavericks and a cray xc system.
> the mac has java 8
> while the cray xc has java 7.
>
> We could not get the code to run just using the java launch command,
> although we noticed if you add
>
> catch(NoClassDefFoundError e) {
>
>   System.out.println("Not using MPI its out to lunch for now");
>
> }
>
> as one of the catches after the try for firing up MPI, you can get further.
>
> Instead we tried on the two systems using
>
> mpirun -np 1 java MPITestBroke tweets repeat.txt
>
> and, you guessed it, we can't reproduce the error, at least using master.
>
> Would you mind trying to get a copy of nightly master build off of
>
> http://www.open-mpi.org/nightly/master/
>
> and install that version and give it a try.
>
> If that works, then I'd suggest using master (or v2.0) for now.
>
> Howard
>
>
>
>
> 2015-08-05 14:41 GMT-06:00 Nate Chambers :
>
>> Howard,
>>
>> Thanks for looking at all this. Adding System.gc() did not cause it to
>> segfault. The segfault still comes much later in the processing.
>>
>> I was able to reduce my code to a single test file without other
>> dependencies. It is attached. This code simply opens a text file and reads
>> its lines, one by one. Once finished, it closes and opens the same file and
>> reads the lines again. On my system, it does this about 4 times until the
>> segfault fires. Obviously this code makes no sense, but it's based on our
>> actual code that reads millions of lines of data and does various
>> processing to it.
>>
>> Attached is a tweets.tgz file that you can uncompress to have an input
>> directory. The text file is just the same line over and over again. Run it
>> as:
>>
>> *java MPITestBroke tweets/*
>>
>>
>> Nate
>>
>>
>>
>>
>>
>> On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard 
>> wrote:
>>
>>> Hi Nate,
>>>
>>> Sorry for the delay in getting back.  Thanks for the sanity check.  You
>>> may have a point about the args string to MPI.init -
>>> there's nothing the Open MPI is needing from this but that is a
>>> difference with your use case - your app has an argument.
>>>
>>> Would you mind adding a
>>>
>>> System.gc()
>>>
>>> call immediately after MPI.init call and see if the gc blows up with a
>>> segfault?
>>>
>>> Also, may be interesting to add the -verbose:jni to your command line.
>>>
>>> We'll do some experiments here with the init string arg.
>>>
>>> Is your app open source where we could download it and try to reproduce
>>> the problem locally?
>>>
>>> thanks,
>>>
>>> Howard
>>>
>>>
>>> 2015-08-04 18:52 GMT-06:00 Nate Chambers :
>>>
>>>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>>>> expected program's output.
>>>>
>>>> Does MPI.init(args) expect anything from those command-line args?
>>>>
>>>>
>>>> Nate
>>>>
&g

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-12 Thread Nate Chambers
*I appreciate you trying to help! I put the Java and its compiled .class
file on Dropbox. The directory contains the .java and .class files, as well
as a data/ directory:*

http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0

*You can run it with and without MPI:*

>  java MPITestBroke data/
>  mpirun -np 1 java MPITestBroke data/

*Attached is a text file of what I see when I run it with mpirun and your
debug flag. Lots of debug lines.*


Nate





On Wed, Aug 12, 2015 at 11:09 AM, Howard Pritchard 
wrote:

> Hi Nate,
>
> Sorry for the delay in getting back to you.
>
> We're somewhat stuck on how to help you, but here are two suggestions.
>
> Could you add the following to your launch command line
>
> --mca odls_base_verbose 100
>
> so we can see exactly what arguments are being feed to java when launching
> your app.
>
> Also, if you could put your MPITestBroke.class file somewhere (like google
> drive)
> where we could get it and try to run locally or at NERSC, that might help
> us
> narrow down the problem.Better yet, if you have the class or jar file
> for
> the entire app plus some data sets, we could try that out as well.
>
> All the config outputs, etc. you've sent so far indicate a correct
> installation
> of open mpi.
>
> Howard
>
>
> On Aug 6, 2015 1:54 PM, "Nate Chambers"  wrote:
>
>> Howard,
>>
>> I tried the nightly build openmpi-dev-2223-g731cfe3 and it still
>> segfaults as before. I must admit I am new to MPI, so is it possible I'm
>> just configuring or running incorrectly? Let me list my steps for you, and
>> maybe something will jump out? Also attached is my config.log.
>>
>>
>> CONFIGURE
>> ./configure --prefix= --enable-mpi-java CC=gcc
>>
>> MAKE
>> make all install
>>
>> RUN
>> /mpirun -np 1 java MPITestBroke twitter/
>>
>>
>> DEFAULT JAVA AND GCC
>>
>> $ java -version
>> java version "1.7.0_21"
>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>
>> $ gcc --v
>> Using built-in specs.
>> Target: x86_64-redhat-linux
>> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
>> --infodir=/usr/share/info --with-bugurl=
>> http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared
>> --enable-threads=posix --enable-checking=release --with-system-zlib
>> --enable-__cxa_atexit --disable-libunwind-exceptions
>> --enable-gnu-unique-object
>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada
>> --enable-java-awt=gtk --disable-dssi
>> --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
>> --enable-libgcj-multifile --enable-java-maintainer-mode
>> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
>> --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
>> --build=x86_64-redhat-linux
>> Thread model: posix
>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
>>
>>
>>
>>
>>
>> On Thu, Aug 6, 2015 at 7:58 AM, Howard Pritchard 
>> wrote:
>>
>>> HI Nate,
>>>
>>> We're trying this out on a mac running mavericks and a cray xc system.
>>> the mac has java 8
>>> while the cray xc has java 7.
>>>
>>> We could not get the code to run just using the java launch command,
>>> although we noticed if you add
>>>
>>> catch(NoClassDefFoundError e) {
>>>
>>>   System.out.println("Not using MPI its out to lunch for now");
>>>
>>> }
>>>
>>> as one of the catches after the try for firing up MPI, you can get
>>> further.
>>>
>>> Instead we tried on the two systems using
>>>
>>> mpirun -np 1 java MPITestBroke tweets repeat.txt
>>>
>>> and, you guessed it, we can't reproduce the error, at least using master.
>>>
>>> Would you mind trying to get a copy of nightly master build off of
>>>
>>> http://www.open-mpi.org/nightly/master/
>>>
>>> and install that version and give it a try.
>>>
>>> If that works, then I'd suggest using master (or v2.0) for now.
>>>
>>> Howard
>>>
>>>
>>>
>>>
>>> 2015-08-05 14:41 GMT-06:00 Nate Chambers :
>>>
>>>> Howard,
>>>>
>>>> Thanks for looking at all this. Adding System.gc() did not cause it to
>>>> segfault. The segfault still comes much later in the processing.
&

Re: [OMPI users] segfault on java binding from MPI.init()

2015-08-24 Thread Nate Chambers
All,

It works!! Giles with the fix!

I ran it with his suggested flags:
mpirun --mca mtl ^psm -np 1 java MPITestBroke data/

The test code now runs without the segfault occurring around the 5th loop.
It will be a while before I can put this back into our bigger code that
first caused our segfault, but for now this is looking very promising. I
will keep you posted. Thanks again.


Nate



On Sat, Aug 15, 2015 at 6:30 PM, Howard Pritchard 
wrote:

> Gilles,
>
> On hopper there aren't any psm libraries - its an infiniband/infinipath
> free system -
> at least on the compute nodes.
>
> For my own work, I never use things like the platform files, I just do
> ./configure --prefix=blahblah --enable-mpi-java (and whatever else I want
> to test this tie)
>
> Thanks for the ideas though,
>
> Howard
>
>
> 2015-08-14 19:20 GMT-06:00 Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com>:
>
>> Howard,
>>
>> I have no infinipath hardware, but the infinipath libraries are installed.
>> I tried to run with --mca mtl_psm_priority 0 instead of --mca mtl ^psm
>> but that did not work.
>> without psm mtl, I was unable to reproduce the persistent communication
>> issue,
>> so I concluded there was only one issue here.
>>
>> do you configure with --disable-dlopen on hopper ?
>> I wonder whether --mca mtl ^psm is effective if dlopen is disabled
>>
>> Cheers,
>>
>> Gilles
>>
>> On Saturday, August 15, 2015, Howard Pritchard 
>> wrote:
>>
>>> Hi Jeff,
>>>
>>> I don't know why Gilles keeps picking on the persistent request problem
>>> and mixing
>>> it up with this user bug.  I do think for this user the psm probably is
>>> the problem.
>>>
>>>
>>> They don't have anything to do with each other.
>>>
>>> I can reproduce the persistent request problem on hopper consistently.
>>> As I said
>>> on the telecon last week it has something to do with memory corruption
>>> with the
>>> receive buffer that is associated with the persistent request.
>>>
>>> Howard
>>>
>>>
>>> 2015-08-14 11:21 GMT-06:00 Jeff Squyres (jsquyres) :
>>>
>>>> Hmm.  Oscar's not around to ask any more, but I'd be greatly surprised
>>>> if he had InfiniPath on his systems where he ran into this segv issue...?
>>>>
>>>>
>>>> > On Aug 14, 2015, at 1:08 PM, Howard Pritchard 
>>>> wrote:
>>>> >
>>>> > Hi Gilles,
>>>> >
>>>> > Good catch!  Nate we hadn't been testing on a infinipath system.
>>>> >
>>>> > Howard
>>>> >
>>>> >
>>>> > 2015-08-14 0:20 GMT-06:00 Gilles Gouaillardet :
>>>> > Nate,
>>>> >
>>>> > i could get rid of the problem by not using the psm mtl.
>>>> > the infinipath library (used by the psm mtl) sets some signal
>>>> handlers that conflict with the JVM
>>>> > that can be seen by running
>>>> > mpirun -np 1 java -Xcheck:jni MPITestBroke data/
>>>> >
>>>> > so instead of running
>>>> > mpirun -np 1 java MPITestBroke data/
>>>> > please run
>>>> > mpirun --mca mtl ^psm -np 1 java MPITestBroke data/
>>>> >
>>>> > that solved the issue for me
>>>> >
>>>> > Cheers,
>>>> >
>>>> > Gilles
>>>> >
>>>> > On 8/13/2015 9:19 AM, Nate Chambers wrote:
>>>> >> I appreciate you trying to help! I put the Java and its compiled
>>>> .class file on Dropbox. The directory contains the .java and .class files,
>>>> as well as a data/ directory:
>>>> >>
>>>> >>
>>>> http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0
>>>> >>
>>>> >> You can run it with and without MPI:
>>>> >>
>>>> >> >  java MPITestBroke data/
>>>> >> >  mpirun -np 1 java MPITestBroke data/
>>>> >>
>>>> >> Attached is a text file of what I see when I run it with mpirun and
>>>> your debug flag. Lots of debug lines.
>>>> >>
>>>> >>
>>>> >> Nate
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, Aug 12, 2015 a