Hi Gilles,

Good catch!  Nate we hadn't been testing on a infinipath system.

Howard


2015-08-14 0:20 GMT-06:00 Gilles Gouaillardet <gil...@rist.or.jp>:

> Nate,
>
> i could get rid of the problem by not using the psm mtl.
> the infinipath library (used by the psm mtl) sets some signal handlers
> that conflict with the JVM
> that can be seen by running
> mpirun -np 1 java -Xcheck:jni MPITestBroke data/
>
> so instead of running
> mpirun -np 1 java MPITestBroke data/
> please run
> mpirun --mca mtl ^psm -np 1 java MPITestBroke data/
>
> that solved the issue for me
>
> Cheers,
>
> Gilles
>
> On 8/13/2015 9:19 AM, Nate Chambers wrote:
>
> *I appreciate you trying to help! I put the Java and its compiled .class
> file on Dropbox. The directory contains the .java and .class files, as well
> as a data/ directory:*
>
> http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0
>
> *You can run it with and without MPI:*
>
> >  java MPITestBroke data/
> >  mpirun -np 1 java MPITestBroke data/
>
> *Attached is a text file of what I see when I run it with mpirun and your
> debug flag. Lots of debug lines.*
>
>
> Nate
>
>
>
>
>
> On Wed, Aug 12, 2015 at 11:09 AM, Howard Pritchard < <hpprit...@gmail.com>
> hpprit...@gmail.com> wrote:
>
>> Hi Nate,
>>
>> Sorry for the delay in getting back to you.
>>
>> We're somewhat stuck on how to help you, but here are two suggestions.
>>
>> Could you add the following to your launch command line
>>
>> --mca odls_base_verbose 100
>>
>> so we can see exactly what arguments are being feed to java when launching
>> your app.
>>
>> Also, if you could put your MPITestBroke.class file somewhere (like
>> google drive)
>> where we could get it and try to run locally or at NERSC, that might help
>> us
>> narrow down the problem.    Better yet, if you have the class or jar file
>> for
>> the entire app plus some data sets, we could try that out as well.
>>
>> All the config outputs, etc. you've sent so far indicate a correct
>> installation
>> of open mpi.
>>
>> Howard
>>
>>
>> On Aug 6, 2015 1:54 PM, "Nate Chambers" <ncham...@usna.edu> wrote:
>>
>>> Howard,
>>>
>>> I tried the nightly build openmpi-dev-2223-g731cfe3 and it still
>>> segfaults as before. I must admit I am new to MPI, so is it possible I'm
>>> just configuring or running incorrectly? Let me list my steps for you, and
>>> maybe something will jump out? Also attached is my config.log.
>>>
>>>
>>> CONFIGURE
>>> ./configure --prefix=<install-dir> --enable-mpi-java CC=gcc
>>>
>>> MAKE
>>> make all install
>>>
>>> RUN
>>> <install-dir>/mpirun -np 1 java MPITestBroke twitter/
>>>
>>>
>>> DEFAULT JAVA AND GCC
>>>
>>> $ java -version
>>> java version "1.7.0_21"
>>> Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
>>> Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
>>>
>>> $ gcc --v
>>> Using built-in specs.
>>> Target: x86_64-redhat-linux
>>> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
>>> --infodir=/usr/share/info --with-bugurl=
>>> <http://bugzilla.redhat.com/bugzilla>http://bugzilla.redhat.com/bugzilla
>>> --enable-bootstrap --enable-shared --enable-threads=posix
>>> --enable-checking=release --with-system-zlib --enable-__cxa_atexit
>>> --disable-libunwind-exceptions --enable-gnu-unique-object
>>> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada
>>> --enable-java-awt=gtk --disable-dssi
>>> --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
>>> --enable-libgcj-multifile --enable-java-maintainer-mode
>>> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
>>> --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686
>>> --build=x86_64-redhat-linux
>>> Thread model: posix
>>> gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC)
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Aug 6, 2015 at 7:58 AM, Howard Pritchard < <hpprit...@gmail.com>
>>> hpprit...@gmail.com> wrote:
>>>
>>>> HI Nate,
>>>>
>>>> We're trying this out on a mac running mavericks and a cray xc system.
>>>>   the mac has java 8
>>>> while the cray xc has java 7.
>>>>
>>>> We could not get the code to run just using the java launch command,
>>>> although we noticed if you add
>>>>
>>>>     catch(NoClassDefFoundError e) {
>>>>
>>>>       System.out.println("Not using MPI its out to lunch for now");
>>>>
>>>>     }
>>>>
>>>> as one of the catches after the try for firing up MPI, you can get
>>>> further.
>>>>
>>>> Instead we tried on the two systems using
>>>>
>>>> mpirun -np 1 java MPITestBroke tweets repeat.txt
>>>>
>>>> and, you guessed it, we can't reproduce the error, at least using
>>>> master.
>>>>
>>>> Would you mind trying to get a copy of nightly master build off of
>>>>
>>>> http://www.open-mpi.org/nightly/master/
>>>>
>>>> and install that version and give it a try.
>>>>
>>>> If that works, then I'd suggest using master (or v2.0) for now.
>>>>
>>>> Howard
>>>>
>>>>
>>>>
>>>>
>>>> 2015-08-05 14:41 GMT-06:00 Nate Chambers < <ncham...@usna.edu>
>>>> ncham...@usna.edu>:
>>>>
>>>>> Howard,
>>>>>
>>>>> Thanks for looking at all this. Adding System.gc() did not cause it to
>>>>> segfault. The segfault still comes much later in the processing.
>>>>>
>>>>> I was able to reduce my code to a single test file without other
>>>>> dependencies. It is attached. This code simply opens a text file and reads
>>>>> its lines, one by one. Once finished, it closes and opens the same file 
>>>>> and
>>>>> reads the lines again. On my system, it does this about 4 times until the
>>>>> segfault fires. Obviously this code makes no sense, but it's based on our
>>>>> actual code that reads millions of lines of data and does various
>>>>> processing to it.
>>>>>
>>>>> Attached is a tweets.tgz file that you can uncompress to have an input
>>>>> directory. The text file is just the same line over and over again. Run it
>>>>> as:
>>>>>
>>>>> *java MPITestBroke tweets/*
>>>>>
>>>>>
>>>>> Nate
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Aug 5, 2015 at 8:29 AM, Howard Pritchard <
>>>>> <hpprit...@gmail.com>hpprit...@gmail.com> wrote:
>>>>>
>>>>>> Hi Nate,
>>>>>>
>>>>>> Sorry for the delay in getting back.  Thanks for the sanity check.
>>>>>> You may have a point about the args string to MPI.init -
>>>>>> there's nothing the Open MPI is needing from this but that is a
>>>>>> difference with your use case - your app has an argument.
>>>>>>
>>>>>> Would you mind adding a
>>>>>>
>>>>>> System.gc()
>>>>>>
>>>>>> call immediately after MPI.init call and see if the gc blows up with
>>>>>> a segfault?
>>>>>>
>>>>>> Also, may be interesting to add the -verbose:jni to your command line.
>>>>>>
>>>>>> We'll do some experiments here with the init string arg.
>>>>>>
>>>>>> Is your app open source where we could download it and try to
>>>>>> reproduce the problem locally?
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Howard
>>>>>>
>>>>>>
>>>>>> 2015-08-04 18:52 GMT-06:00 Nate Chambers < <ncham...@usna.edu>
>>>>>> ncham...@usna.edu>:
>>>>>>
>>>>>>> Sanity checks pass. Both Hello and Ring.java run correctly with the
>>>>>>> expected program's output.
>>>>>>>
>>>>>>> Does MPI.init(args) expect anything from those command-line args?
>>>>>>>
>>>>>>>
>>>>>>> Nate
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard <
>>>>>>> <hpprit...@gmail.com>hpprit...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hello Nate,
>>>>>>>>
>>>>>>>> As a sanity check of your installation, could you try to compile
>>>>>>>> the examples/*.java codes using the mpijavac you've installed and see 
>>>>>>>> that
>>>>>>>> those run correctly?
>>>>>>>> I'd be just interested in the Hello.java and Ring.java?
>>>>>>>>
>>>>>>>> Howard
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-08-04 14:34 GMT-06:00 Nate Chambers < <ncham...@usna.edu>
>>>>>>>> ncham...@usna.edu>:
>>>>>>>>
>>>>>>>>> Sure, I reran the configure with CC=gcc and then make install. I
>>>>>>>>> think that's the proper way to do it. Attached is my config log. The
>>>>>>>>> behavior when running our code appears to be the same. The output is 
>>>>>>>>> the
>>>>>>>>> same error I pasted in my email above. It occurs when calling 
>>>>>>>>> MPI.init().
>>>>>>>>>
>>>>>>>>> I'm not great at debugging this sort of stuff, but happy to try
>>>>>>>>> things out if you need me to.
>>>>>>>>>
>>>>>>>>> Nate
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 4, 2015 at 5:09 AM, Howard Pritchard <
>>>>>>>>> <hpprit...@gmail.com>hpprit...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello Nate,
>>>>>>>>>>
>>>>>>>>>> As a first step to addressing this, could you please try using
>>>>>>>>>> gcc rather than the Intel compilers to build Open MPI?
>>>>>>>>>>
>>>>>>>>>> We've been doing a lot of work recently on the java bindings,
>>>>>>>>>> etc. but have never tried using any compilers other
>>>>>>>>>> than gcc when working with the java bindings.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Howard
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2015-08-03 17:36 GMT-06:00 Nate Chambers < <ncham...@usna.edu>
>>>>>>>>>> ncham...@usna.edu>:
>>>>>>>>>>
>>>>>>>>>>> We've been struggling with this error for a while, so hoping
>>>>>>>>>>> someone more knowledgeable can help!
>>>>>>>>>>>
>>>>>>>>>>> Our java MPI code exits with a segfault during its normal
>>>>>>>>>>> operation, *but the segfault occurs before our code ever uses
>>>>>>>>>>> MPI functionality like sending/receiving. *We've removed all
>>>>>>>>>>> message calls and any use of MPI.COMM_WORLD from the code. The 
>>>>>>>>>>> segfault
>>>>>>>>>>> occurs if we call MPI.init(args) in our code, and does not if we 
>>>>>>>>>>> comment
>>>>>>>>>>> that line out. Further vexing us, the crash doesn't happen at the 
>>>>>>>>>>> point of
>>>>>>>>>>> the MPI.init call, but later on in the program. I don't have an 
>>>>>>>>>>> easy-to-run
>>>>>>>>>>> example here because our non-MPI code is so large and complicated. 
>>>>>>>>>>> We have
>>>>>>>>>>> run simpler test programs with MPI and the segfault does not occur.
>>>>>>>>>>>
>>>>>>>>>>> We have isolated the line where the segfault occurs. However, if
>>>>>>>>>>> we comment that out, the program will run longer, but then randomly 
>>>>>>>>>>> (but
>>>>>>>>>>> deterministically) segfault later on in the code. Does anyone have 
>>>>>>>>>>> tips on
>>>>>>>>>>> how to debug this? We have tried several flags with mpirun, but no 
>>>>>>>>>>> good
>>>>>>>>>>> clues.
>>>>>>>>>>>
>>>>>>>>>>> We have also tried several MPI versions, including stable 1.8.7
>>>>>>>>>>> and the most recent 1.8.8rc1
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ATTACHED
>>>>>>>>>>> - config.log from installation
>>>>>>>>>>> - output from `ompi_info -all`
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> OUTPUT FROM RUNNING
>>>>>>>>>>>
>>>>>>>>>>> > mpirun -np 2 java -mx4g FeaturizeDay datadir/ days.txt
>>>>>>>>>>> ...
>>>>>>>>>>> some normal output from our code
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun noticed that process rank 0 with PID 29646 on node r9n69
>>>>>>>>>>> exited on signal 11 (Segmentation fault).
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org
>>>>>>>>>>> Subscription:
>>>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> Link to this post:
>>>>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27386.php>
>>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27386.php
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org
>>>>>>>>>> Subscription:
>>>>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>> Link to this post:
>>>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27389.php>
>>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27389.php
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org
>>>>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> Link to this post:
>>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27391.php>
>>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27391.php
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> <us...@open-mpi.org>us...@open-mpi.org
>>>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>> Link to this post:
>>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27392.php>
>>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27392.php
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> <us...@open-mpi.org>us...@open-mpi.org
>>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> Link to this post:
>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27393.php>
>>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27393.php
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>> Link to this post:
>>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27396.php>
>>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27396.php
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27399.php>
>>>>> http://www.open-mpi.org/community/lists/users/2015/08/27399.php
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> <http://www.open-mpi.org/community/lists/users/2015/08/27405.php>
>>>> http://www.open-mpi.org/community/lists/users/2015/08/27405.php
>>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/08/27406.php
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/08/27446.php
>>
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/08/27450.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/08/27465.php
>

Reply via email to