date:20160630

[OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Saliya Ekanayake

Hi, Looking at the *ompi/mca/coll/sm/coll_sm_module.c* it seems this module will be used only if the calling communicator solely groups processes within a node. I've got two questions here. 1. So is my understanding correct that for something like MPI_COMM_WORLD where world is multiple processes

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Gilles Gouaillardet

1) is correct. coll/sm is disqualified if the communicator is an inter communicator or the communicator spans on several nodes. you can have a look at the source code, and you will not that bcast does not use send/recv. instead, it uses a shared memory, so hopefully, it is faster than other mo

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Saliya Ekanayake

Thank you, Gilles. What is the bcast I should look for? In general, how do I know which module was used to for which communication - can I print this info? On Jun 30, 2016 3:19 AM, "Gilles Gouaillardet" wrote: > 1) is correct. coll/sm is disqualified if the communicator is an inter > communicato

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Gilles Gouaillardet

the Bcast in coll/sm coll modules have priority (see ompi_info --all) for a given function (e,g. bcast) the module which implements it and has the highest priority is used. note a module can disqualify itself on a given communicator (e.g. coll/sm on I ter node communucator). by default, coll/tune

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Saliya Ekanayake

OK, I am beginning to see how it works now. One question I still have is, in the case of a mult-node communicator it seems coll/tuned (or something not coll/sm) well be the one used, so do they do any optimizations to reduce communication within a node? Also where can I find the p2p send recv modu

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Gilles Gouaillardet

currently, coll/tuned is not topology aware. this is something interesting, and everyone is invited to contribute. coll/ml is topology aware, but it is kind of unmaintained now. send/recv involves two abstraction layer pml, and then the interconnect transport. typically, pml/ob1 is used, and it us

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Saliya Ekanayake

Thank you, Gilles. The reason for digging into intra-node optimizations is that we've implemented several machine learning applications in OpenMPI (Java binding), but found collective communication to be a bottleneck, especially when the number of procs per node is high. I've implemented a shared m

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Gilles Gouaillardet

you might want to give coll/ml a try mpirun --mca coll_ml_priority 100 ... Cheers, Gilles On Thursday, June 30, 2016, Saliya Ekanayake wrote: > Thank you, Gilles. The reason for digging into intra-node optimizations is > that we've implemented several machine learning applications in OpenMPI >

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Saliya Ekanayake

OK, that's good. I'll try that. So, is *ml* something not being developed now? Any documentation on this component? Thank you, Saliya On Thu, Jun 30, 2016 at 11:01 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > you might want to give coll/ml a try > mpirun --mca coll_ml_prior

[OMPI users] Hang in MPI_Abort

2016-06-30 Thread Orion Poplawski

I'm seeing hangs when MPI_Abort is called. This is with openmpi 1.10.3. e.g: program output: Testing -- big dataset test (bigdset) Proc 3: *** Parallel ERROR *** VRFY (sizeof(MPI_Offset)>4) failed at line 479 in ../../testpar/t_mdset.c aborting MPI processes Testing -- big dataset test (

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Orion Poplawski

On 06/30/2016 09:49 AM, Orion Poplawski wrote: > I'm seeing hangs when MPI_Abort is called. This is with openmpi 1.10.3. e.g: I'll also note that I'm seeing this on 32-bit arm, but not i686 or x86_64. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoR

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Ralph Castain

Are the procs still alive? Is this on a single node? > On Jun 30, 2016, at 8:49 AM, Orion Poplawski wrote: > > I'm seeing hangs when MPI_Abort is called. This is with openmpi 1.10.3. e.g: > > program output: > > Testing -- big dataset test (bigdset) > Proc 3: *** Parallel ERROR *** >VRF

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

2016-06-30 Thread Jeff Squyres (jsquyres)

I actually wouldn't advise ml. It *was* being developed as a joint project between ORNL and Mellanox. I think that code eventually grew into what the "hcoll" Mellanox library currently is. As such, ml reflects kind of a middle point before hcoll became hardened into a real product. It has so

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Orion Poplawski

No, just mpiexec is running. single node. Only see it when the test is executed with "make check", not seeing it if I just run mpiexec -n 6 ./testphdf5 by hand. On 06/30/2016 09:58 AM, Ralph Castain wrote: > Are the procs still alive? Is this on a single node? > >> On Jun 30, 2016, at 8:49 AM,

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Orion Poplawski

On 06/30/2016 10:33 AM, Orion Poplawski wrote: > No, just mpiexec is running. single node. Only see it when the test is > executed with "make check", not seeing it if I just run mpiexec -n 6 > ./testphdf5 by hand. Hmm, now I'm seeing it running mpiexec by hand. Trying to check it via gdb indic

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Ralph Castain

So the application procs are all gone, but mpiexec isn’t exiting? I’d suggest running valgrind, given the corruption. > On Jun 30, 2016, at 10:21 AM, Orion Poplawski wrote: > > On 06/30/2016 10:33 AM, Orion Poplawski wrote: >> No, just mpiexec is running. single node. Only see it when the tes

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Orion Poplawski

valgrind output: $ valgrind mpiexec -n 6 ./testphdf5 ==8518== Memcheck, a memory error detector ==8518== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==8518== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==8518== Command: mpiexec -n 6 ./testphdf5 ==8518== =

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Orion Poplawski

On 06/30/2016 02:55 PM, Orion Poplawski wrote: > valgrind output: > > $ valgrind mpiexec -n 6 ./testphdf5 > ==8518== Memcheck, a memory error detector > ==8518== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. > ==8518== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright

Re: [OMPI users] Hang in MPI_Abort

2016-06-30 Thread Ralph Castain

Rats - and this only happens on arm32? > On Jun 30, 2016, at 1:56 PM, Orion Poplawski wrote: > > On 06/30/2016 02:55 PM, Orion Poplawski wrote: >> valgrind output: >> >> $ valgrind mpiexec -n 6 ./testphdf5 >> ==8518== Memcheck, a memory error detector >> ==8518== Copyright (C) 2002-2015, and GN

[OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

[OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

Re: [OMPI users] Hang in MPI_Abort

19 matches

Site Navigation

Mail list logo

Footer information