Re: [OMPI users] OMPI collectives

George Bosilca Fri, 3 Nov 2006 13:03:11 -0500

v1.1 does not have the tuned collective (I think but now I'm not 100%sure anymore), or at least they were not active by default. The firstversion with the tuned collective will be 1.2. The current decisionfunction (from the nightly builds) target high performance networkswith 2 characteristics: low latency (4-5 micro-sec) and highbandwidth (over 1Gb/s).

There are several implementations for each of the algorithms. Someare wired and some are not. The most difficult part is to make sureeach of these implementations is correct (from MPI point of view) andgive the expected answer in all circumstances. More functions wehave, more tests we have to perform, and right now that's the mainlimitation. We have other algorithms implemented which are not in theOpen MPI right now. They will come as soon as they get tested wellenough in order for us to feel confident about their correctness.


Here are the answers:

1. Not all algorithms are wired to be showed by ompi_info. Everythingout of range is set to the default value which means the currentdecision function.2. The Allreduce algorithms are coming soon. Btw, all algorithmsinside Open MPi support segmentation and all of the tree based one,support a fanout input (number of children).

Time is the only thing we're missing right now ... i.e. the weeks(now without the s) before SC.


  george.


On Nov 2, 2006, at 11:00 PM, Tony Ladd wrote:

George
I found the info I think you were referring to. Thanks. I thenexperimentedessentially randomly with different algorithms for all reduce. Butthe issuewith really bad performance for certain message sizes persistedwith v1.1.The good news is that the upgrade to 1.2 fixed my worst problem.Now the
performance is reasonable for all message sizes. I will test the tuned
algorithms again asap.

I had a couple of questions
1) Ompi_info lists only 3 or 4 algorithms for allreduce and reduceand about5 for b'cast. But you can use higher numbers as well. Are theseadditionalundocmented algorithms (you mentioned a number like 15) or is itignoring
out of range parameters?
2) It seems for allreduce you can select a tuned reduce and tunedbcastinstead of the binary tree. But there is a faster allreduce whichis order2N rather than 4N for Reduce + Bcast (N is msg size). It segmentsthe vectorand distributes the root among the nodes; in an allreduce there isno needto gather the root vector to one processor and then scatter itagain. Iwrote a simple version for powers of 2 (MPI_SUM)-any chance of itbeing
implemented in OMPI.

Tony


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] OMPI collectives

Reply via email to