Gilles,
Yes I saw that github thread, but wasn't certain this was the same issue.
Very possible that it is. Oddly enough, that github code doesn't crash for
us.

Adding a sleep call doesn't help. It's actually now crashing on the
MPI.init(args) call itself, and the JVM is reporting the error. Earlier it
would get past this point. I'm not certain why this has changed all of a
sudden. We did change a bit in our unrelated java code...

Below is the output. It does match more closely to that previous report.


Nate

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002b00ad2807cf, pid=28537, tid=47281916847872
#
# JRE version: 7.0_21-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x57c7cf]  jni_GetStringUTFChars+0x9f
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /gpfs/home/nchamber/hs_err_pid28537.log
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00002b198c15b7cf, pid=28538, tid=47388736182016
#
# JRE version: 7.0_21-b11
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode
linux-amd64 compressed oops)
# Problematic frame:
# V  [libjvm.so+0x57c7cf]#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
  jni_GetStringUTFChars+0x9f
#
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /gpfs/home/nchamber/hs_err_pid28538.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
#
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 28537 on node r3n70 exited on
signal 6 (Aborted).
--------------------------------------------------------------------------










On Mon, Aug 3, 2015 at 2:47 PM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Nate,
>
> a similar issue has already been reported at
> https://github.com/open-mpi/ompi/issues/369, but we have
> not yet been able to figure out what is going wrong.
>
> right after MPI_Init(), can you add
> Thread.sleep(5000);
> and see if it helps ?
>
> Cheers,
>
> Gilles
>
>
> On 8/4/2015 8:36 AM, Nate Chambers wrote:
>
> We've been struggling with this error for a while, so hoping someone more
> knowledgeable can help!
>
> Our java MPI code exits with a segfault during its normal operation, *but
> the segfault occurs before our code ever uses MPI functionality like
> sending/receiving. *We've removed all message calls and any use of
> MPI.COMM_WORLD from the code. The segfault occurs if we call MPI.init(args)
> in our code, and does not if we comment that line out. Further vexing us,
> the crash doesn't happen at the point of the MPI.init call, but later on in
> the program. I don't have an easy-to-run example here because our non-MPI
> code is so large and complicated. We have run simpler test programs with
> MPI and the segfault does not occur.
>
> We have isolated the line where the segfault occurs. However, if we
> comment that out, the program will run longer, but then randomly (but
> deterministically) segfault later on in the code. Does anyone have tips on
> how to debug this? We have tried several flags with mpirun, but no good
> clues.
>
> We have also tried several MPI versions, including stable 1.8.7 and the
> most recent 1.8.8rc1
>
>
> ATTACHED
> - config.log from installation
> - output from `ompi_info -all`
>
>
> OUTPUT FROM RUNNING
>
> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
> ...
> some normal output from our code
> ...
> --------------------------------------------------------------------------
> mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited on
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
>
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27387.php
>

Reply via email to