Hmm.  Oscar's not around to ask any more, but I'd be greatly surprised if he 
had InfiniPath on his systems where he ran into this segv issue...?


> On Aug 14, 2015, at 1:08 PM, Howard Pritchard <hpprit...@gmail.com> wrote:
> 
> Hi Gilles,
> 
> Good catch!  Nate we hadn't been testing on a infinipath system.
> 
> Howard
> 
> 
> 2015-08-14 0:20 GMT-06:00 Gilles Gouaillardet <gil...@rist.or.jp>:
> Nate,
> 
> i could get rid of the problem by not using the psm mtl.
> the infinipath library (used by the psm mtl) sets some signal handlers that 
> conflict with the JVM
> that can be seen by running
> mpirun -np 1 java -Xcheck:jni MPITestBroke data/
> 
> so instead of running
> mpirun -np 1 java MPITestBroke data/
> please run
> mpirun --mca mtl ^psm -np 1 java MPITestBroke data/
> 
> that solved the issue for me
> 
> Cheers,
> 
> Gilles
> 
> On 8/13/2015 9:19 AM, Nate Chambers wrote:
>> I appreciate you trying to help! I put the Java and its compiled .class file 
>> on Dropbox. The directory contains the .java and .class files, as well as a 
>> data/ directory:
>> 
>> http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0
>> 
>> You can run it with and without MPI:
>> 
>> >  java MPITestBroke data/
>> >  mpirun -np 1 java MPITestBroke data/
>> 
>> Attached is a text file of what I see when I run it with mpirun and your 
>> debug flag. Lots of debug lines.
>> 
>> 
>> Nate
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 12, 2015 at 11:09 AM, Howard Pritchard <hpprit...@gmail.com> 
>> wrote:
>> Hi Nate,
>> 
>> Sorry for the delay in getting back to you.
>> We're somewhat stuck on how to help you, but here are two suggestions.
>> 
>> Could you add the following to your launch command line
>> 
>> --mca odls_base_verbose 100
>> 
>> so we can see exactly what arguments are being feed to java when launching
>> your app.
>> 
>> Also, if you could put your MPITestBroke.class file somewhere (like google 
>> drive)
>> where we could get it and try to run locally or at NERSC, that might help us 
>> narrow down the problem.    Better yet, if you have the class or jar file for
>> the entire app plus some data sets, we could try that out as well.
>> 
>> All the config outputs, etc. you've sent so far indicate a correct 
>> installation
>> of open mpi.
>> 
>> Howard
>> 
>> 
>> On Aug 6, 2015 1:54 PM, "Nate Chambers" <ncham...@usna.edu> wrote:
>> Howard,
>> 
>> I tried the nightly build openmpi-dev-2223-g731cfe3 and it still segfaults 
>> as before. I must admit I am new to MPI, so is it possible I'm just 
>> configuring or running incorrectly? Let me list my steps for you, and maybe 
>> something will jump out? Also attached is my config.log.
>> 
>> 
>> CONFIGURE 
>> ./configure --prefix=<install-dir> --enable-mpi-java CC=gcc
>> 
>> MAKE
>> make all install
>> 
>> RUN
>> <install-dir>/mpirun -np 1 java MPITestBroke twitter/
>> 
>> 
>> DEFAULT JAVA AND GCC
>> 
>> $ java -version
>> java version "1.7.0_21"
>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>> 
>> $ gcc --v
>> Using built-in specs.
>> Target: x86_64-redhat-linux
>> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
>> --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
>> --enable-bootstrap --enable-shared --enable-threads=posix 
>> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
>> --disable-libunwind-exceptions --enable-gnu-unique-object 
>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk 
>> --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre 
>> --enable-libgcj-multifile --enable-java-maintainer-mode 
>> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib 
>> --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 
>> --build=x86_64-redhat-linux
>> Thread model: posix
>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
>> 
>> 
>> 
>> 
>> 
>> On Thu, Aug 6, 2015 at 7:58 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
>> HI Nate,
>> 
>> We're trying this out on a mac running mavericks and a cray xc system.   the 
>> mac has java 8 
>> while the cray xc has java 7.
>> 
>> We could not get the code to run just using the java launch command, 
>> although we noticed if you add
>> 
>>     catch(NoClassDefFoundError e) {
>> 
>>       System.out.println("Not using MPI its out to lunch for now");
>> 
>>     }
>> 
>> as one of the catches after the try for firing up MPI, you can get further.
>> 
>> Instead we tried on the two systems using
>> 
>> mpirun -np 1 java MPITestBroke tweets repeat.txt
>> 
>> and, you guessed it, we can't reproduce the error, at least using master.
>> 
>> Would you mind trying to get a copy of nightly master build off of
>> 
>> http://www.open-mpi.org/nightly/master/
>> and install that version and give it a try.
>> 
>> If that works, then I'd suggest using master (or v2.0) for now. 
>> Howard
>> 
>> 
>> 
>> 
>> 2015-08-05 14:41 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>> Howard,
>> 
>> Thanks for looking at all this. Adding System.gc() did not cause it to 
>> segfault. The segfault still comes much later in the processing.
>> 
>> I was able to reduce my code to a single test file without other 
>> dependencies. It is attached. This code simply opens a text file and reads 
>> its lines, one by one. Once finished, it closes and opens the same file and 
>> reads the lines again. On my system, it does this about 4 times until the 
>> segfault fires. Obviously this code makes no sense, but it's based on our 
>> actual code that reads millions of lines of data and does various processing 
>> to it.
>> 
>> Attached is a tweets.tgz file that you can uncompress to have an input 
>> directory. The text file is just the same line over and over again. Run it 
>> as:
>> 
>> java MPITestBroke tweets/
>> 
>> 
>> Nate
>> 
>> 
>> 
>> 
>> 
>> On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
>> Hi Nate,
>> 
>> Sorry for the delay in getting back.  Thanks for the sanity check.  You may 
>> have a point about the args string to MPI.init - 
>> there's nothing the Open MPI is needing from this but that is a difference 
>> with your use case - your app has an argument.
>> 
>> Would you mind adding a 
>> 
>> System.gc()
>> 
>> call immediately after MPI.init call and see if the gc blows up with a 
>> segfault?
>> 
>> Also, may be interesting to add the -verbose:jni to your command line.
>> 
>> We'll do some experiments here with the init string arg.
>> 
>> Is your app open source where we could download it and try to reproduce the 
>> problem locally?
>> 
>> thanks,
>> 
>> Howard
>> 
>> 
>> 2015-08-04 18:52 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>> Sanity checks pass. Both Hello and Ring.java run correctly with the expected 
>> program's output.
>> 
>> Does MPI.init(args) expect anything from those command-line args? 
>> 
>> 
>> Nate
>> 
>> 
>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard <hpprit...@gmail.com> 
>> wrote:
>> Hello Nate,
>> 
>> As a sanity check of your installation, could you try to compile the 
>> examples/*.java codes using the mpijavac you've installed and see that those 
>> run correctly?
>> I'd be just interested in the Hello.java and Ring.java?
>> 
>> Howard
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 2015-08-04 14:34 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>> Sure, I reran the configure with CC=gcc and then make install. I think 
>> that's the proper way to do it. Attached is my config log. The behavior when 
>> running our code appears to be the same. The output is the same error I 
>> pasted in my email above. It occurs when calling MPI.init().
>> 
>> I'm not great at debugging this sort of stuff, but happy to try things out 
>> if you need me to.
>> 
>> Nate
>> 
>> 
>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard <hpprit...@gmail.com> wrote:
>> Hello Nate,
>> 
>> As a first step to addressing this, could you please try using gcc rather 
>> than the Intel compilers to build Open MPI?
>> 
>> We've been doing a lot of work recently on the java bindings, etc. but have 
>> never tried using any compilers other
>> than gcc when working with the java bindings.
>> 
>> Thanks,
>> 
>> Howard
>> 
>> 
>> 2015-08-03 17:36 GMT-06:00 Nate Chambers <ncham...@usna.edu>:
>> We've been struggling with this error for a while, so hoping someone more 
>> knowledgeable can help!
>> 
>> Our java MPI code exits with a segfault during its normal operation, but the 
>> segfault occurs before our code ever uses MPI functionality like 
>> sending/receiving. We've removed all message calls and any use of 
>> MPI.COMM_WORLD from the code. The segfault occurs if we call MPI.init(args) 
>> in our code, and does not if we comment that line out. Further vexing us, 
>> the crash doesn't happen at the point of the MPI.init call, but later on in 
>> the program. I don't have an easy-to-run example here because our non-MPI 
>> code is so large and                                                         
>>   complicated. We have run simpler test programs with MPI and the segfault 
>> does not occur.
>> 
>> We have isolated the line where the segfault occurs. However, if we comment 
>> that out, the program will run longer, but then randomly (but 
>> deterministically) segfault later on in the code. Does anyone have tips on 
>> how to debug this? We have tried several flags with mpirun, but no good 
>> clues.
>> 
>> We have also tried several MPI versions, including stable 1.8.7 and the most 
>> recent 1.8.8rc1
>> 
>> 
>> ATTACHED
>> - config.log from installation
>> - output from `ompi_info -all`
>> 
>> 
>> OUTPUT FROM RUNNING
>> 
>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt 
>> ...
>> some normal output from our code
>> ...
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 29646 on node r9n69 exited on 
>> signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>> 
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27392.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27393.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27396.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27399.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27405.php
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27406.php
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27446.php
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> 
>> us...@open-mpi.org
>> 
>> Subscription: 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/08/27450.php
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/08/27465.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/08/27471.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to