Hi Gabriele, it might be that your message size is too large for available memory per node. I had a problem with IMB when I was not able to run to completion Alltoall on N=128, ppn=8 on our cluster with 16 GB per node. You'd think 16 GB is quite a lot but when you do the maths: 2* 4 MB * 128 procs * 8 procs/node = 8 GB/node plus you need to double because of buffering. I was told by Mellanox (our cards are ConnectX cards) that they introduced XRC in OFED 1.3 in addition to Share Receive Queue which should reduce memory foot print but I have not tested this yet. HTH, Igor 2009/1/23 Gabriele Fatigati <g.fatig...@cineca.it>
> Hi Igor, > My message size is 4096kb and i have 4 procs per core. > There isn't any difference using different algorithms.. > > 2009/1/23 Igor Kozin <i.n.ko...@googlemail.com>: > > what is your message size and the number of cores per node? > > is there any difference using different algorithms? > > > > 2009/1/23 Gabriele Fatigati <g.fatig...@cineca.it> > >> > >> Hi Jeff, > >> i would like to understand why, if i run over 512 procs or more, my > >> code stops over mpi collective, also with little send buffer. All > >> processors are locked into call, doing nothing. But, if i add > >> MPI_Barrier after MPI collective, it works! I run over Infiniband > >> net. > >> > >> I know many people with this strange problem, i think there is a > >> strange interaction between Infiniband and OpenMPI that causes it. > >> > >> > >> > >> 2009/1/23 Jeff Squyres <jsquy...@cisco.com>: > >> > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote: > >> > > >> >> I've noted that OpenMPI has an asynchronous behaviour in the > collective > >> >> calls. > >> >> The processors, doesn't wait that other procs arrives in the call. > >> > > >> > That is correct. > >> > > >> >> This behaviour sometimes can cause some problems with a lot of > >> >> processors in the jobs. > >> > > >> > Can you describe what exactly you mean? The MPI spec specifically > >> > allows > >> > this behavior; OMPI made specific design choices and optimizations to > >> > support this behavior. FWIW, I'd be pretty surprised if any optimized > >> > MPI > >> > implementation defaults to fully synchronous collective operations. > >> > > >> >> Is there an OpenMPI parameter to lock all process in the collective > >> >> call until is finished? Otherwise i have to insert many MPI_Barrier > >> >> in my code and it is very tedious and strange.. > >> > > >> > As you have notes, MPI_Barrier is the *only* collective operation that > >> > MPI > >> > guarantees to have any synchronization properties (and it's a fairly > >> > weak > >> > guarantee at that; no process will exit the barrier until every > process > >> > has > >> > entered the barrier -- but there's no guarantee that all processes > leave > >> > the > >> > barrier at the same time). > >> > > >> > Why do you need your processes to exit collective operations at the > same > >> > time? > >> > > >> > -- > >> > Jeff Squyres > >> > Cisco Systems > >> > > >> > _______________________________________________ > >> > users mailing list > >> > us...@open-mpi.org > >> > http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > > >> > > >> > >> > >> > >> -- > >> Ing. Gabriele Fatigati > >> > >> Parallel programmer > >> > >> CINECA Systems & Tecnologies Department > >> > >> Supercomputing Group > >> > >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > >> > >> www.cineca.it Tel: +39 051 6171722 > >> > >> g.fatigati [AT] cineca.it > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > -- > Ing. Gabriele Fatigati > > Parallel programmer > > CINECA Systems & Tecnologies Department > > Supercomputing Group > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it Tel: +39 051 6171722 > > g.fatigati [AT] cineca.it > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >