Re: [OMPI users] MPI Processes and Auto Vectorization

Tim Prince Tue, 1 Dec 2009 07:32:09 -0500

amjad ali wrote:

Hi,
Suppose we run a parallel MPI code with 64 processes on a cluster, sayof 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node.
Now all the 64 cores on the cluster running a process. Program is SPMD,means all processes has the same workload.
Now if we had done auto-vectorization while compiling the code (forexample with Intel compilers); Will there be any benefit(efficiency/scalability improvement) of having code with theauto-vectorization? Or we will get the same performance as withoutAuto-vectorization in this example case?MEANS THAT if we do not have free cpu cores in a PC or cluster (allcores are running MPI processes), still the auto-vertorization isbeneficial? Or it is beneficial only if we have some free cpu coreslocally?
How can we really get benefit in performance improvement withAuto-Vectorization?

Auto-vectorization should give similar performance benefit under MPI asit does in a single process. That's about all that can be said when yousay nothing about the nature of your application. This assumes thatyour MPI domain decomposition, which may not be highly vectorizable,doesn't take up too large a fraction of elapsed time. By the sametoken, auto-vectorization techniques aren't specific to MPIapplications, so an in-depth treatment isn't topical here.I'll just mention that we are well into the era of 3 levels ofprogramming parallelization: vectorization, threaded parallel (e.g.OpenMP), and process parallel (e.g. MPI).For an application which I work on, 8 nodes with auto-vectorization giveabout the performance of 12 nodes without, so compilers withoutauto-vectorization capability for such applications fell by the waysidea decade ago. This application gains significant benefit from cacheblocking, so vectorization has more opportunity to gain than forapplications which have less memory locality.I have not seen an application which was effectively vectorized whichalso gained from HyperThreading, but the gain for vectorization shouldbe significantly greater than could be gained from HyperThreading. It'salso common that vectorization gains more on lower clock speed/cheaperCPU models (of the same architecture), enabling lower cost of purchaseor power consumption, but that's true of all forms of parallelization.Some applications can be vectorized effectively by any of the popularauto-vectorizing compilers, including recent gnu compilers, while othersshow much more gain with certain compilers, such as Intel, PGI, or Open64.

Re: [OMPI users] MPI Processes and Auto Vectorization

Reply via email to