Hi Igor, My message size is 4096kb and i have 4 procs per core. There isn't any difference using different algorithms..
2009/1/23 Igor Kozin <i.n.ko...@googlemail.com>: > what is your message size and the number of cores per node? > is there any difference using different algorithms? > > 2009/1/23 Gabriele Fatigati <g.fatig...@cineca.it> >> >> Hi Jeff, >> i would like to understand why, if i run over 512 procs or more, my >> code stops over mpi collective, also with little send buffer. All >> processors are locked into call, doing nothing. But, if i add >> MPI_Barrier after MPI collective, it works! I run over Infiniband >> net. >> >> I know many people with this strange problem, i think there is a >> strange interaction between Infiniband and OpenMPI that causes it. >> >> >> >> 2009/1/23 Jeff Squyres <jsquy...@cisco.com>: >> > On Jan 23, 2009, at 6:32 AM, Gabriele Fatigati wrote: >> > >> >> I've noted that OpenMPI has an asynchronous behaviour in the collective >> >> calls. >> >> The processors, doesn't wait that other procs arrives in the call. >> > >> > That is correct. >> > >> >> This behaviour sometimes can cause some problems with a lot of >> >> processors in the jobs. >> > >> > Can you describe what exactly you mean? The MPI spec specifically >> > allows >> > this behavior; OMPI made specific design choices and optimizations to >> > support this behavior. FWIW, I'd be pretty surprised if any optimized >> > MPI >> > implementation defaults to fully synchronous collective operations. >> > >> >> Is there an OpenMPI parameter to lock all process in the collective >> >> call until is finished? Otherwise i have to insert many MPI_Barrier >> >> in my code and it is very tedious and strange.. >> > >> > As you have notes, MPI_Barrier is the *only* collective operation that >> > MPI >> > guarantees to have any synchronization properties (and it's a fairly >> > weak >> > guarantee at that; no process will exit the barrier until every process >> > has >> > entered the barrier -- but there's no guarantee that all processes leave >> > the >> > barrier at the same time). >> > >> > Why do you need your processes to exit collective operations at the same >> > time? >> > >> > -- >> > Jeff Squyres >> > Cisco Systems >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >> > >> >> >> >> -- >> Ing. Gabriele Fatigati >> >> Parallel programmer >> >> CINECA Systems & Tecnologies Department >> >> Supercomputing Group >> >> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >> >> www.cineca.it Tel: +39 051 6171722 >> >> g.fatigati [AT] cineca.it >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Ing. Gabriele Fatigati Parallel programmer CINECA Systems & Tecnologies Department Supercomputing Group Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy www.cineca.it Tel: +39 051 6171722 g.fatigati [AT] cineca.it