Hi Gilles, Good catch! Nate we hadn't been testing on a infinipath system.
Howard 2015-08-14 0:20 GMT-06:00 Gilles Gouaillardet <gil...@rist.or.jp>: > Nate, > > i could get rid of the problem by not using the psm mtl. > the infinipath library (used by the psm mtl) sets some signal handlers > that conflict with the JVM > that can be seen by running > mpirun -np 1 java -Xcheck:jni MPITestBroke data/ > > so instead of running > mpirun -np 1 java MPITestBroke data/ > please run > mpirun --mca mtl ^psm -np 1 java MPITestBroke data/ > > that solved the issue for me > > Cheers, > > Gilles > > On 8/13/2015 9:19 AM, Nate Chambers wrote: > > *I appreciate you trying to help! I put the Java and its compiled .class > file on Dropbox. The directory contains the .java and .class files, as well > as a data/ directory:* > > http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0 > > *You can run it with and without MPI:* > > > java MPITestBroke data/ > > mpirun -np 1 java MPITestBroke data/ > > *Attached is a text file of what I see when I run it with mpirun and your > debug flag. Lots of debug lines.* > > > Nate > > > > > > On Wed, Aug 12, 2015 at 11:09 AM, Howard Pritchard < <hpprit...@gmail.com> > hpprit...@gmail.com> wrote: > >> Hi Nate, >> >> Sorry for the delay in getting back to you. >> >> We're somewhat stuck on how to help you, but here are two suggestions. >> >> Could you add the following to your launch command line >> >> --mca odls_base_verbose 100 >> >> so we can see exactly what arguments are being feed to java when launching >> your app. >> >> Also, if you could put your MPITestBroke.class file somewhere (like >> google drive) >> where we could get it and try to run locally or at NERSC, that might help >> us >> narrow down the problem. Better yet, if you have the class or jar file >> for >> the entire app plus some data sets, we could try that out as well. >> >> All the config outputs, etc. you've sent so far indicate a correct >> installation >> of open mpi. >> >> Howard >> >> >> On Aug 6, 2015 1:54 PM, "Nate Chambers" <ncham...@usna.edu> wrote: >> >>> Howard, >>> >>> I tried the nightly build openmpi-dev-2223-g731cfe3 and it still >>> segfaults as before. I must admit I am new to MPI, so is it possible I'm >>> just configuring or running incorrectly? Let me list my steps for you, and >>> maybe something will jump out? Also attached is my config.log. >>> >>> >>> CONFIGURE >>> ./configure --prefix=<install-dir> --enable-mpi-java CC=gcc >>> >>> MAKE >>> make all install >>> >>> RUN >>> <install-dir>/mpirun -np 1 java MPITestBroke twitter/ >>> >>> >>> DEFAULT JAVA AND GCC >>> >>> $ java -version >>> java version "1.7.0_21" >>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11) >>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) >>> >>> $ gcc --v >>> Using built-in specs. >>> Target: x86_64-redhat-linux >>> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man >>> --infodir=/usr/share/info --with-bugurl= >>> <http://bugzilla.redhat.com/bugzilla>http://bugzilla.redhat.com/bugzilla >>> --enable-bootstrap --enable-shared --enable-threads=posix >>> --enable-checking=release --with-system-zlib --enable-__cxa_atexit >>> --disable-libunwind-exceptions --enable-gnu-unique-object >>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada >>> --enable-java-awt=gtk --disable-dssi >>> --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre >>> --enable-libgcj-multifile --enable-java-maintainer-mode >>> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib >>> --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 >>> --build=x86_64-redhat-linux >>> Thread model: posix >>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) >>> >>> >>> >>> >>> >>> On Thu, Aug 6, 2015 at 7:58 AM, Howard Pritchard < <hpprit...@gmail.com> >>> hpprit...@gmail.com> wrote: >>> >>>> HI Nate, >>>> >>>> We're trying this out on a mac running mavericks and a cray xc system. >>>> the mac has java 8 >>>> while the cray xc has java 7. >>>> >>>> We could not get the code to run just using the java launch command, >>>> although we noticed if you add >>>> >>>> catch(NoClassDefFoundError e) { >>>> >>>> System.out.println("Not using MPI its out to lunch for now"); >>>> >>>> } >>>> >>>> as one of the catches after the try for firing up MPI, you can get >>>> further. >>>> >>>> Instead we tried on the two systems using >>>> >>>> mpirun -np 1 java MPITestBroke tweets repeat.txt >>>> >>>> and, you guessed it, we can't reproduce the error, at least using >>>> master. >>>> >>>> Would you mind trying to get a copy of nightly master build off of >>>> >>>> http://www.open-mpi.org/nightly/master/ >>>> >>>> and install that version and give it a try. >>>> >>>> If that works, then I'd suggest using master (or v2.0) for now. >>>> >>>> Howard >>>> >>>> >>>> >>>> >>>> 2015-08-05 14:41 GMT-06:00 Nate Chambers < <ncham...@usna.edu> >>>> ncham...@usna.edu>: >>>> >>>>> Howard, >>>>> >>>>> Thanks for looking at all this. Adding System.gc() did not cause it to >>>>> segfault. The segfault still comes much later in the processing. >>>>> >>>>> I was able to reduce my code to a single test file without other >>>>> dependencies. It is attached. This code simply opens a text file and reads >>>>> its lines, one by one. Once finished, it closes and opens the same file >>>>> and >>>>> reads the lines again. On my system, it does this about 4 times until the >>>>> segfault fires. Obviously this code makes no sense, but it's based on our >>>>> actual code that reads millions of lines of data and does various >>>>> processing to it. >>>>> >>>>> Attached is a tweets.tgz file that you can uncompress to have an input >>>>> directory. The text file is just the same line over and over again. Run it >>>>> as: >>>>> >>>>> *java MPITestBroke tweets/* >>>>> >>>>> >>>>> Nate >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard < >>>>> <hpprit...@gmail.com>hpprit...@gmail.com> wrote: >>>>> >>>>>> Hi Nate, >>>>>> >>>>>> Sorry for the delay in getting back. Thanks for the sanity check. >>>>>> You may have a point about the args string to MPI.init - >>>>>> there's nothing the Open MPI is needing from this but that is a >>>>>> difference with your use case - your app has an argument. >>>>>> >>>>>> Would you mind adding a >>>>>> >>>>>> System.gc() >>>>>> >>>>>> call immediately after MPI.init call and see if the gc blows up with >>>>>> a segfault? >>>>>> >>>>>> Also, may be interesting to add the -verbose:jni to your command line. >>>>>> >>>>>> We'll do some experiments here with the init string arg. >>>>>> >>>>>> Is your app open source where we could download it and try to >>>>>> reproduce the problem locally? >>>>>> >>>>>> thanks, >>>>>> >>>>>> Howard >>>>>> >>>>>> >>>>>> 2015-08-04 18:52 GMT-06:00 Nate Chambers < <ncham...@usna.edu> >>>>>> ncham...@usna.edu>: >>>>>> >>>>>>> Sanity checks pass. Both Hello and Ring.java run correctly with the >>>>>>> expected program's output. >>>>>>> >>>>>>> Does MPI.init(args) expect anything from those command-line args? >>>>>>> >>>>>>> >>>>>>> Nate >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard < >>>>>>> <hpprit...@gmail.com>hpprit...@gmail.com> wrote: >>>>>>> >>>>>>>> Hello Nate, >>>>>>>> >>>>>>>> As a sanity check of your installation, could you try to compile >>>>>>>> the examples/*.java codes using the mpijavac you've installed and see >>>>>>>> that >>>>>>>> those run correctly? >>>>>>>> I'd be just interested in the Hello.java and Ring.java? >>>>>>>> >>>>>>>> Howard >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers < <ncham...@usna.edu> >>>>>>>> ncham...@usna.edu>: >>>>>>>> >>>>>>>>> Sure, I reran the configure with CC=gcc and then make install. I >>>>>>>>> think that's the proper way to do it. Attached is my config log. The >>>>>>>>> behavior when running our code appears to be the same. The output is >>>>>>>>> the >>>>>>>>> same error I pasted in my email above. It occurs when calling >>>>>>>>> MPI.init(). >>>>>>>>> >>>>>>>>> I'm not great at debugging this sort of stuff, but happy to try >>>>>>>>> things out if you need me to. >>>>>>>>> >>>>>>>>> Nate >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard < >>>>>>>>> <hpprit...@gmail.com>hpprit...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hello Nate, >>>>>>>>>> >>>>>>>>>> As a first step to addressing this, could you please try using >>>>>>>>>> gcc rather than the Intel compilers to build Open MPI? >>>>>>>>>> >>>>>>>>>> We've been doing a lot of work recently on the java bindings, >>>>>>>>>> etc. but have never tried using any compilers other >>>>>>>>>> than gcc when working with the java bindings. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Howard >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers < <ncham...@usna.edu> >>>>>>>>>> ncham...@usna.edu>: >>>>>>>>>> >>>>>>>>>>> We've been struggling with this error for a while, so hoping >>>>>>>>>>> someone more knowledgeable can help! >>>>>>>>>>> >>>>>>>>>>> Our java MPI code exits with a segfault during its normal >>>>>>>>>>> operation, *but the segfault occurs before our code ever uses >>>>>>>>>>> MPI functionality like sending/receiving. *We've removed all >>>>>>>>>>> message calls and any use of MPI.COMM_WORLD from the code. The >>>>>>>>>>> segfault >>>>>>>>>>> occurs if we call MPI.init(args) in our code, and does not if we >>>>>>>>>>> comment >>>>>>>>>>> that line out. Further vexing us, the crash doesn't happen at the >>>>>>>>>>> point of >>>>>>>>>>> the MPI.init call, but later on in the program. I don't have an >>>>>>>>>>> easy-to-run >>>>>>>>>>> example here because our non-MPI code is so large and complicated. >>>>>>>>>>> We have >>>>>>>>>>> run simpler test programs with MPI and the segfault does not occur. >>>>>>>>>>> >>>>>>>>>>> We have isolated the line where the segfault occurs. However, if >>>>>>>>>>> we comment that out, the program will run longer, but then randomly >>>>>>>>>>> (but >>>>>>>>>>> deterministically) segfault later on in the code. Does anyone have >>>>>>>>>>> tips on >>>>>>>>>>> how to debug this? We have tried several flags with mpirun, but no >>>>>>>>>>> good >>>>>>>>>>> clues. >>>>>>>>>>> >>>>>>>>>>> We have also tried several MPI versions, including stable 1.8.7 >>>>>>>>>>> and the most recent 1.8.8rc1 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ATTACHED >>>>>>>>>>> - config.log from installation >>>>>>>>>>> - output from `ompi_info -all` >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> OUTPUT FROM RUNNING >>>>>>>>>>> >>>>>>>>>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt >>>>>>>>>>> ... >>>>>>>>>>> some normal output from our code >>>>>>>>>>> ... >>>>>>>>>>> >>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>> mpirun noticed that process rank 0 with PID 29646 on node r9n69 >>>>>>>>>>> exited on signal 11 (Segmentation fault). >>>>>>>>>>> >>>>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org >>>>>>>>>>> Subscription: >>>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> Link to this post: >>>>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27386.php> >>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org >>>>>>>>>> Subscription: >>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> Link to this post: >>>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27389.php> >>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org >>>>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> Link to this post: >>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27391.php> >>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> <us...@open-mpi.org>us...@open-mpi.org >>>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> Link to this post: >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27392.php> >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27392.php >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> <us...@open-mpi.org>us...@open-mpi.org >>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27393.php> >>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27393.php >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27396.php> >>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27396.php >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27399.php> >>>>> http://www.open-mpi.org/community/lists/users/2015/08/27399.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> <http://www.open-mpi.org/community/lists/users/2015/08/27405.php> >>>> http://www.open-mpi.org/community/lists/users/2015/08/27405.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/08/27406.php >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/08/27446.php >> > > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27450.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27465.php >