Nate,
i could get rid of the problem by not using the psm mtl.
the infinipath library (used by the psm mtl) sets some signal handlers
that conflict with the JVM
that can be seen by running
mpirun -np 1 java -Xcheck:jni MPITestBroke data/
so instead of running
mpirun -np 1 java MPITestBroke data/
please run
mpirun --mca mtl ^psm -np 1 java MPITestBroke data/
that solved the issue for me
Cheers,
Gilles
On 8/13/2015 9:19 AM, Nate Chambers wrote:
*I appreciate you trying to help! I put the Java and its compiled
.class file on Dropbox. The directory contains the .java and .class
files, as well as a data/ directory:*
http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0
*You can run it with and without MPI:*
> java MPITestBroke data/
> mpirun -np 1 java MPITestBroke data/
*Attached is a text file of what I see when I run it with mpirun and
your debug flag. Lots of debug lines.*
*
*
*
*
Nate
On Wed, Aug 12, 2015 at 11:09 AM, Howard Pritchard
<hpprit...@gmail.com <mailto:hpprit...@gmail.com>> wrote:
Hi Nate,
Sorry for the delay in getting back to you.
We're somewhat stuck on how to help you, but here are two suggestions.
Could you add the following to your launch command line
--mca odls_base_verbose 100
so we can see exactly what arguments are being feed to java when
launching
your app.
Also, if you could put your MPITestBroke.class file somewhere
(like google drive)
where we could get it and try to run locally or at NERSC, that
might help us
narrow down the problem. Better yet, if you have the class or
jar file for
the entire app plus some data sets, we could try that out as well.
All the config outputs, etc. you've sent so far indicate a correct
installation
of open mpi.
Howard
On Aug 6, 2015 1:54 PM, "Nate Chambers" <ncham...@usna.edu
<mailto:ncham...@usna.edu>> wrote:
Howard,
I tried the nightly build openmpi-dev-2223-g731cfe3 and it
still segfaults as before. I must admit I am new to MPI, so is
it possible I'm just configuring or running incorrectly? Let
me list my steps for you, and maybe something will jump out?
Also attached is my config.log.
CONFIGURE
./configure --prefix=<install-dir> --enable-mpi-java CC=gcc
MAKE
make all install
RUN
<install-dir>/mpirun -np 1 java MPITestBroke twitter/
DEFAULT JAVA AND GCC
$ java -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
$ gcc --v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib
--enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--disable-libjava-multilib --with-ppl --with-cloog
--with-tune=generic --with-arch_32=i686
--build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
On Thu, Aug 6, 2015 at 7:58 AM, Howard Pritchard
<hpprit...@gmail.com <mailto:hpprit...@gmail.com>> wrote:
HI Nate,
We're trying this out on a mac running mavericks and a
cray xc system. the mac has java 8
while the cray xc has java 7.
We could not get the code to run just using the java
launch command, although we noticed if you add
catch(NoClassDefFoundError e) {
System.out.println("Not using MPI its out to lunch
for now");
}
as one of the catches after the try for firing up MPI, you
can get further.
Instead we tried on the two systems using
mpirun -np 1 java MPITestBroke tweets repeat.txt
and, you guessed it, we can't reproduce the error, at
least using master.
Would you mind trying to get a copy of nightly master
build off of
http://www.open-mpi.org/nightly/master/
and install that version and give it a try.
If that works, then I'd suggest using master (or v2.0) for
now.
Howard
2015-08-05 14:41 GMT-06:00 Nate Chambers
<ncham...@usna.edu <mailto:ncham...@usna.edu>>:
Howard,
Thanks for looking at all this. Adding System.gc() did
not cause it to segfault. The segfault still comes
much later in the processing.
I was able to reduce my code to a single test file
without other dependencies. It is attached. This code
simply opens a text file and reads its lines, one by
one. Once finished, it closes and opens the same file
and reads the lines again. On my system, it does this
about 4 times until the segfault fires. Obviously this
code makes no sense, but it's based on our actual code
that reads millions of lines of data and does various
processing to it.
Attached is a tweets.tgz file that you can uncompress
to have an input directory. The text file is just the
same line over and over again. Run it as:
*java MPITestBroke tweets/*
Nate
On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard
<hpprit...@gmail.com <mailto:hpprit...@gmail.com>> wrote:
Hi Nate,
Sorry for the delay in getting back. Thanks for
the sanity check. You may have a point about the
args string to MPI.init -
there's nothing the Open MPI is needing from this
but that is a difference with your use case - your
app has an argument.
Would you mind adding a
System.gc()
call immediately after MPI.init call and see if
the gc blows up with a segfault?
Also, may be interesting to add the -verbose:jni
to your command line.
We'll do some experiments here with the init
string arg.
Is your app open source where we could download it
and try to reproduce the problem locally?
thanks,
Howard
2015-08-04 18:52 GMT-06:00 Nate Chambers
<ncham...@usna.edu <mailto:ncham...@usna.edu>>:
Sanity checks pass. Both Hello and Ring.java
run correctly with the expected program's output.
Does MPI.init(args) expect anything from those
command-line args?
Nate
On Tue, Aug 4, 2015 at 12:26 PM, Howard
Pritchard <hpprit...@gmail.com
<mailto:hpprit...@gmail.com>> wrote:
Hello Nate,
As a sanity check of your installation,
could you try to compile the
examples/*.java codes using the mpijavac
you've installed and see that those run
correctly?
I'd be just interested in the Hello.java
and Ring.java?
Howard
2015-08-04 14:34 GMT-06:00 Nate Chambers
<ncham...@usna.edu
<mailto:ncham...@usna.edu>>:
Sure, I reran the configure with
CC=gcc and then make install. I think
that's the proper way to do it.
Attached is my config log. The
behavior when running our code appears
to be the same. The output is the same
error I pasted in my email above. It
occurs when calling MPI.init().
I'm not great at debugging this sort
of stuff, but happy to try things out
if you need me to.
Nate
On Tue, Aug 4, 2015 at 5:09 AM, Howard
Pritchard <hpprit...@gmail.com
<mailto:hpprit...@gmail.com>> wrote:
Hello Nate,
As a first step to addressing
this, could you please try using
gcc rather than the Intel
compilers to build Open MPI?
We've been doing a lot of work
recently on the java bindings,
etc. but have never tried using
any compilers other
than gcc when working with the
java bindings.
Thanks,
Howard
2015-08-03 17:36 GMT-06:00 Nate
Chambers <ncham...@usna.edu
<mailto:ncham...@usna.edu>>:
We've been struggling with
this error for a while, so
hoping someone more
knowledgeable can help!
Our java MPI code exits with a
segfault during its normal
operation, *but the segfault
occurs before our code ever
uses MPI functionality like
sending/receiving. *We've
removed all message calls and
any use of MPI.COMM_WORLD from
the code. The segfault occurs
if we call MPI.init(args) in
our code, and does not if we
comment that line out. Further
vexing us, the crash doesn't
happen at the point of the
MPI.init call, but later on in
the program. I don't have an
easy-to-run example here
because our non-MPI code is so
large and complicated. We have
run simpler test programs with
MPI and the segfault does not
occur.
We have isolated the line
where the segfault occurs.
However, if we comment that
out, the program will run
longer, but then randomly (but
deterministically) segfault
later on in the code. Does
anyone have tips on how to
debug this? We have tried
several flags with mpirun, but
no good clues.
We have also tried several MPI
versions, including stable
1.8.7 and the most recent 1.8.8rc1
ATTACHED
- config.log from installation
- output from `ompi_info -all`
OUTPUT FROM RUNNING
> mpirun -np 2 java -mx4g
FeaturizeDay datadir/ days.txt
...
some normal output from our code
...
--------------------------------------------------------------------------
mpirun noticed that process
rank 0 with PID 29646 on node
r9n69 exited on signal 11
(Segmentation fault).
--------------------------------------------------------------------------
_______________________________________________
users mailing list
us...@open-mpi.org
<mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27386.php
_______________________________________________
users mailing list
us...@open-mpi.org
<mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27389.php
_______________________________________________
users mailing list
us...@open-mpi.org
<mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27391.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27392.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27393.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27396.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27399.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27405.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27406.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27446.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/08/27450.php