HI Nate, We're trying this out on a mac running mavericks and a cray xc system. the mac has java 8 while the cray xc has java 7.
We could not get the code to run just using the java launch command, although we noticed if you add catch(NoClassDefFoundError e) { System.out.println("Not using MPI its out to lunch for now"); } as one of the catches after the try for firing up MPI, you can get further. Instead we tried on the two systems using mpirun -np 1 java MPITestBroke tweets repeat.txt and, you guessed it, we can't reproduce the error, at least using master. Would you mind trying to get a copy of nightly master build off of http://www.open-mpi.org/nightly/master/ and install that version and give it a try. If that works, then I'd suggest using master (or v2.0) for now. Howard 2015-08-05 14:41 GMT-06:00 Nate Chambers <ncham...@usna.edu>: > Howard, > > Thanks for looking at all this. Adding System.gc() did not cause it to > segfault. The segfault still comes much later in the processing. > > I was able to reduce my code to a single test file without other > dependencies. It is attached. This code simply opens a text file and reads > its lines, one by one. Once finished, it closes and opens the same file and > reads the lines again. On my system, it does this about 4 times until the > segfault fires. Obviously this code makes no sense, but it's based on our > actual code that reads millions of lines of data and does various > processing to it. > > Attached is a tweets.tgz file that you can uncompress to have an input > directory. The text file is just the same line over and over again. Run it > as: > > *java MPITestBroke tweets/* > > > Nate > > > > > > On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard <hpprit...@gmail.com> > wrote: > >> Hi Nate, >> >> Sorry for the delay in getting back. Thanks for the sanity check. You >> may have a point about the args string to MPI.init - >> there's nothing the Open MPI is needing from this but that is a >> difference with your use case - your app has an argument. >> >> Would you mind adding a >> >> System.gc() >> >> call immediately after MPI.init call and see if the gc blows up with a >> segfault? >> >> Also, may be interesting to add the -verbose:jni to your command line. >> >> We'll do some experiments here with the init string arg. >> >> Is your app open source where we could download it and try to reproduce >> the problem locally? >> >> thanks, >> >> Howard >> >> >> 2015-08-04 18:52 GMT-06:00 Nate Chambers <ncham...@usna.edu>: >> >>> Sanity checks pass. Both Hello and Ring.java run correctly with the >>> expected program's output. >>> >>> Does MPI.init(args) expect anything from those command-line args? >>> >>> >>> Nate >>> >>> >>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard <hpprit...@gmail.com> >>> wrote: >>> >>>> Hello Nate, >>>> >>>> As a sanity check of your installation, could you try to compile the >>>> examples/*.java codes using the mpijavac you've installed and see that >>>> those run correctly? >>>> I'd be just interested in the Hello.java and Ring.java? >>>> >>>> Howard >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers <ncham...@usna.edu>: >>>> >>>>> Sure, I reran the configure with CC=gcc and then make install. I think >>>>> that's the proper way to do it. Attached is my config log. The behavior >>>>> when running our code appears to be the same. The output is the same error >>>>> I pasted in my email above. It occurs when calling MPI.init(). >>>>> >>>>> I'm not great at debugging this sort of stuff, but happy to try things >>>>> out if you need me to. >>>>> >>>>> Nate >>>>> >>>>> >>>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard <hpprit...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hello Nate, >>>>>> >>>>>> As a first step to addressing this, could you please try using gcc >>>>>> rather than the Intel compilers to build Open MPI? >>>>>> >>>>>> We've been doing a lot of work recently on the java bindings, etc. >>>>>> but have never tried using any compilers other >>>>>> than gcc when working with the java bindings. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Howard >>>>>> >>>>>> >>>>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers <ncham...@usna.edu>: >>>>>> >>>>>>> We've been struggling with this error for a while, so hoping someone >>>>>>> more knowledgeable can help! >>>>>>> >>>>>>> Our java MPI code exits with a segfault during its normal operation, >>>>>>> *but >>>>>>> the segfault occurs before our code ever uses MPI functionality like >>>>>>> sending/receiving. *We've removed all message calls and any use of >>>>>>> MPI.COMM_WORLD from the code. The segfault occurs if we call >>>>>>> MPI.init(args) >>>>>>> in our code, and does not if we comment that line out. Further vexing >>>>>>> us, >>>>>>> the crash doesn't happen at the point of the MPI.init call, but later >>>>>>> on in >>>>>>> the program. I don't have an easy-to-run example here because our >>>>>>> non-MPI >>>>>>> code is so large and complicated. We have run simpler test programs with >>>>>>> MPI and the segfault does not occur. >>>>>>> >>>>>>> We have isolated the line where the segfault occurs. However, if we >>>>>>> comment that out, the program will run longer, but then randomly (but >>>>>>> deterministically) segfault later on in the code. Does anyone have tips >>>>>>> on >>>>>>> how to debug this? We have tried several flags with mpirun, but no good >>>>>>> clues. >>>>>>> >>>>>>> We have also tried several MPI versions, including stable 1.8.7 and >>>>>>> the most recent 1.8.8rc1 >>>>>>> >>>>>>> >>>>>>> ATTACHED >>>>>>> - config.log from installation >>>>>>> - output from `ompi_info -all` >>>>>>> >>>>>>> >>>>>>> OUTPUT FROM RUNNING >>>>>>> >>>>>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt >>>>>>> ... >>>>>>> some normal output from our code >>>>>>> ... >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun noticed that process rank 0 with PID 29646 on node r9n69 >>>>>>> exited on signal 11 (Segmentation fault). >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php >>>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/08/27392.php >>>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/08/27393.php >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/08/27396.php >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/08/27399.php >