Thanks David,
i made a PR for the v1.8 branch at
https://github.com/open-mpi/ompi-release/pull/492
the patch is attached (it required some back-porting)
Cheers,
Gilles
On 8/12/2015 4:01 AM, David Shrader wrote:
I have cloned Gilles' topic/hcoll_config branch and, after running
autogen.pl,
"Jeff Squyres (jsquyres)" writes:
> I think Dave's point is that numactl-devel (and numactl) is only needed for
> *building* Open MPI. Users only need numactl to *run* Open MPI.
Yes. However, I guess the basic problem is that the component fails to
load for want of libhwloc, either because (t
"Lane, William" writes:
> I can successfully run my OpenMPI 1.8.7 jobs outside of Son-of-Gridengine but
> not via qrsh. We're
> using CentOS 6.3 and a heterogeneous cluster of hyperthreaded and
> non-hyperthreaded blades
> and x3550 chassis. OpenMPI 1.8.7 has been built w/the debug switch as we
basically, without --hetero-nodes, ompi assumes all nodes have the same
topology (fast startup)
with --hetero-nodes, ompi does not assume anything and request node
topology (slower startup)
I am nor sure if this is still 100% true on all versions.
iirc, at least on master, a hwloc signature is che
Hello Gilles,
Thank you very much for the patch! It is much more complete than mine.
Using that patch and re-running autogen.pl, I am able to build 1.8.8
with './configure --with-hcoll' without errors.
I do have issues when it comes to running 1.8.8 with hcoll built in,
however. In my quick
Hi David,
This issue is from hcoll library. This could be because of symbol conflict
with ml module. This is fixed recently in HCOLL. Can you try with "-mca
coll ^ml" and see if this workaround works in your setup?
-Devendar
On Wed, Aug 12, 2015 at 9:30 AM, David Shrader wrote:
> Hello Gille
Hey Devendar,
It looks like I still get the error:
Konsole output
[dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out
App launch reported: 1 (out of 1) daemons - 2 (out of 2) procs
[1439397957.351764] [zo-fe1:14678:0] shm.c:65 MXM WARN Could
not open the KNEM device file at /
>From where did you grab this HCOLL lib? MOFED or HPCX? what version?
On Wed, Aug 12, 2015 at 9:47 AM, David Shrader wrote:
> Hey Devendar,
>
> It looks like I still get the error:
>
> [dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out
> App launch reported: 1 (out of 1) daemons - 2 (ou
I'm confused why this application needs an asynchronous cuMemcpyAsync()in a blocking MPI call. Rolf could you please explain?And how does is a call to cuMemcpyAsync() followed by a syncronization any different than a cuMemcpy() in this use case?
I would still expect that if the MPI_Send / Recv
Hi Geoff:
Our original implementation used cuMemcpy for copying GPU memory into and out
of host memory. However, what we learned is that the cuMemcpy causes a
synchronization for all work on the GPU. This means that one could not overlap
very well running a kernel and doing communication. So
This is likely because you installed Open MPI 1.8.7 into the same directory as
a prior Open MPI installation.
You probably want to uninstall the old version first (e.g., run "make
uninstall" from the old version's build tree), or just install 1.8.7 into a new
tree.
> On Aug 11, 2015, at 2:22
The admin that rolled the hcoll rpm that we're using (and got it in
system space) said that she got it from
hpcx-v1.3.336-gcc-OFED-1.5.4.1-redhat6.6-x86_64.tar.
Thanks,
David
On 08/12/2015 10:51 AM, Deva wrote:
From where did you grab this HCOLL lib? MOFED or HPCX? what version?
On Wed, Aug
Hi Nate,
Sorry for the delay in getting back to you.
We're somewhat stuck on how to help you, but here are two suggestions.
Could you add the following to your launch command line
--mca odls_base_verbose 100
so we can see exactly what arguments are being feed to java when launching
your app.
David,
This is because of hcoll symbols conflict with ml coll module inside OMPI.
HCOLL is derived from ml module. This issue is fixed in hcoll library and
will be available in next HPCX release.
Some earlier discussion on this issue:
http://www.open-mpi.org/community/lists/users/2015/06/27154.ph
I remember seeing those, but forgot about them. I am curious, though,
why using '-mca coll ^ml' wouldn't work for me.
We'll watch for the next HPCX release. Is there an ETA on when that
release may happen? Thank you for the help!
David
On 08/12/2015 04:04 PM, Deva wrote:
David,
This is beca
do you have "-disable-dlopen" in your configure option? This might force
coll_ml to be loaded first even with -mca coll ^ml.
next HPCX is expected to release by end of Aug.
-Devendar
On Wed, Aug 12, 2015 at 3:30 PM, David Shrader wrote:
> I remember seeing those, but forgot about them. I am cu
*I appreciate you trying to help! I put the Java and its compiled .class
file on Dropbox. The directory contains the .java and .class files, as well
as a data/ directory:*
http://www.dropbox.com/sh/pds5c5wecfpb2wk/AAAcz17UTDQErmrUqp2SPjpqa?dl=0
*You can run it with and without MPI:*
> java MPIT
17 matches
Mail list logo