All,
It works!! Giles with the fix!
I ran it with his suggested flags:
mpirun --mca mtl ^psm -np 1 java MPITestBroke data/
The test code now runs without the segfault occurring around the 5th loop.
It will be a while before I can put this back into our bigger code that
first caused our segfault,
Gilles,
On hopper there aren't any psm libraries - its an infiniband/infinipath
free system -
at least on the compute nodes.
For my own work, I never use things like the platform files, I just do
./configure --prefix=blahblah --enable-mpi-java (and whatever else I want
to test this tie)
Thanks f
Howard,
I have no infinipath hardware, but the infinipath libraries are installed.
I tried to run with --mca mtl_psm_priority 0 instead of --mca mtl ^psm
but that did not work.
without psm mtl, I was unable to reproduce the persistent communication
issue,
so I concluded there was only one issue he
Gotcha; thanks.
> On Aug 14, 2015, at 2:12 PM, Howard Pritchard wrote:
>
> Hi Jeff,
>
> I don't know why Gilles keeps picking on the persistent request problem and
> mixing
> it up with this user bug. I do think for this user the psm probably is the
> problem.
>
>
> They don't have anythin
Hi Jeff,
I don't know why Gilles keeps picking on the persistent request problem and
mixing
it up with this user bug. I do think for this user the psm probably is the
problem.
They don't have anything to do with each other.
I can reproduce the persistent request problem on hopper consistently.
Hmm. Oscar's not around to ask any more, but I'd be greatly surprised if he
had InfiniPath on his systems where he ran into this segv issue...?
> On Aug 14, 2015, at 1:08 PM, Howard Pritchard wrote:
>
> Hi Gilles,
>
> Good catch! Nate we hadn't been testing on a infinipath system.
>
> Howa
Hi Gilles,
Good catch! Nate we hadn't been testing on a infinipath system.
Howard
2015-08-14 0:20 GMT-06:00 Gilles Gouaillardet :
> Nate,
>
> i could get rid of the problem by not using the psm mtl.
> the infinipath library (used by the psm mtl) sets some signal handlers
> that conflict with
Nate,
i could get rid of the problem by not using the psm mtl.
the infinipath library (used by the psm mtl) sets some signal handlers
that conflict with the JVM
that can be seen by running
mpirun -np 1 java -Xcheck:jni MPITestBroke data/
so instead of running
mpirun -np 1 java MPITestBroke dat
Hi Nate,
The odls output helps some. You have a really big CLASSPATH. Also there
might be a small chance that the shmem.jar is causing problems.
Could you try undefining your CLASSPATH just to run the test case?
If the little test case still doesn't work, could you reconfigure the mpi
build to
*I appreciate you trying to help! I put the Java and its compiled .class
file on Dropbox. The directory contains the .java and .class files, as well
as a data/ directory:*
http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0
*You can run it with and without MPI:*
> java MPIT
Hi Nate,
Sorry for the delay in getting back to you.
We're somewhat stuck on how to help you, but here are two suggestions.
Could you add the following to your launch command line
--mca odls_base_verbose 100
so we can see exactly what arguments are being feed to java when launching
your app.
Howard,
I tried the nightly build openmpi-dev-2223-g731cfe3 and it still segfaults
as before. I must admit I am new to MPI, so is it possible I'm just
configuring or running incorrectly? Let me list my steps for you, and maybe
something will jump out? Also attached is my config.log.
CONFIGURE
./
HI Nate,
We're trying this out on a mac running mavericks and a cray xc system.
the mac has java 8
while the cray xc has java 7.
We could not get the code to run just using the java launch command,
although we noticed if you add
catch(NoClassDefFoundError e) {
System.out.println("Not
thanks Nate. We will give the test a try.
--
sent from my smart phonr so no good type.
Howard
On Aug 5, 2015 2:42 PM, "Nate Chambers" wrote:
> Howard,
>
> Thanks for looking at all this. Adding System.gc() did not cause it to
> segfault. The segfault still comes much later in the proc
Howard,
Thanks for looking at all this. Adding System.gc() did not cause it to
segfault. The segfault still comes much later in the processing.
I was able to reduce my code to a single test file without other
dependencies. It is attached. This code simply opens a text file and reads
its lines, on
Hi Nate,
Sorry for the delay in getting back. Thanks for the sanity check. You may
have a point about the args string to MPI.init -
there's nothing the Open MPI is needing from this but that is a difference
with your use case - your app has an argument.
Would you mind adding a
System.gc()
cal
Sanity checks pass. Both Hello and Ring.java run correctly with the
expected program's output.
Does MPI.init(args) expect anything from those command-line args?
Nate
On Tue, Aug 4, 2015 at 12:26 PM, Howard Pritchard
wrote:
> Hello Nate,
>
> As a sanity check of your installation, could you t
Hello Nate,
As a sanity check of your installation, could you try to compile the
examples/*.java codes using the mpijavac you've installed and see that
those run correctly?
I'd be just interested in the Hello.java and Ring.java?
Howard
2015-08-04 14:34 GMT-06:00 Nate Chambers :
> Sure, I
Sure, I reran the configure with CC=gcc and then make install. I think
that's the proper way to do it. Attached is my config log. The behavior
when running our code appears to be the same. The output is the same error
I pasted in my email above. It occurs when calling MPI.init().
I'm not great at
Hello Nate,
As a first step to addressing this, could you please try using gcc rather
than the Intel compilers to build Open MPI?
We've been doing a lot of work recently on the java bindings, etc. but have
never tried using any compilers other
than gcc when working with the java bindings.
Thanks
Gilles,
Yes I saw that github thread, but wasn't certain this was the same issue.
Very possible that it is. Oddly enough, that github code doesn't crash for
us.
Adding a sleep call doesn't help. It's actually now crashing on the
MPI.init(args) call itself, and the JVM is reporting the error. Earli
Nate,
a similar issue has already been reported at
https://github.com/open-mpi/ompi/issues/369, but we have
not yet been able to figure out what is going wrong.
right after MPI_Init(), can you add
Thread.sleep(5000);
and see if it helps ?
Cheers,
Gilles
On 8/4/2015 8:36 AM, Nate Chambers wr
We've been struggling with this error for a while, so hoping someone more
knowledgeable can help!
Our java MPI code exits with a segfault during its normal operation, *but
the segfault occurs before our code ever uses MPI functionality like
sending/receiving. *We've removed all message calls and a
23 matches
Mail list logo