Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

David Warren Tue, 10 Jul 2012 15:55:01 -0400

Your problem may not be related to bandwidth. It may be latency ordivision of the problem. We found significant improvements running wrfand other atmospheric code (CFD) over IB. The problem was not so muchthe amount of data communicated, but how long it takes to send it. Also,is your model big enough to split up as much as you are trying? If thereis not enough work for each core to do between edge exchanges, you willspend all of your time spinning waiting for the network. If you arerunning a demo benchmark it is likely too small for the number ofprocessors. At least that is what we find with most weather model demoproblems. One other thing to look at is how it is being split up.Depending on what the algorithm does, you may want more x points, more ypoints or completely even divisions. We found that we can significantlyspeed up wrf for our particular domain by not lett


On 07/10/12 08:48, Dugenoux Albert wrote:

Thanks for your answer.You are right.
I've tried upon 4 nodes with 6 processes and things are worst.
So do you suggest that unique thing to do is to order an infinibandswitch or is there a possibility to enhance
something by tuning mca parameters ?
*De :* Ralph Castain <r...@open-mpi.org>
*À :* Dugenoux Albert <dugeno...@yahoo.fr>; Open MPI Users<us...@open-mpi.org>
*Envoyé le :* Mardi 10 juillet 2012 16h47
*Objet :* Re: [OMPI users] Bad parallel scaling using Code Saturnewith openmpi
I suspect it mostly reflects communication patterns. I don't knowanything about Saturne, but shared memory is a great deal faster thanTCP, so the more processes sharing a node the better. You may also behitting some natural boundary in your model - perhaps with 8processes/node you wind up with more processes that cross the nodeboundary, further increasing the communication requirement.
Do things continue to get worse if you use all 4 nodes with 6processes/node?
On Jul 10, 2012, at 7:31 AM, Dugenoux Albert wrote:
Hi.
I have recently built a cluster upon a Dell PowerEdge Server with aDebian 6.0 OS. This server is composed of4 system board of 2 processors of hexacores. So it gives 12 cores persystem board.
The boards are linked with a local Gbits switch.
In order to parallelize the software Code Saturne, which is a CFDsolver, I have configured the clustersuch that there are a pbs server/mom on 1 system board and 3 mom andthe 3 others cards. So this leads to48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiledwith the openmpi 1.6 version.When I launch a simulation using 2 nodes with 12 cores, elapse timeis good and network traffic is not full.But when I launch the same simulation using 3 nodes with 8 cores,elapse time is 5 times the previous one.
I both cases, I use 24 cores and network seems not to be satured.
I have tested several configurations : binaries in local file systemor on a NFS. But results are the same.I have visited severals forums (in particularhttp://www.open-mpi.org/community/lists/users/2009/08/10394.php)and read lots of threads, but as I am not an expert at clusters, Ipresently do not see where it is wrong !Is it a problem in the configuration of PBS (I have installed it fromthe deb packages), a subtile compilation options
of openMPI, or a bad network configuration ?
Regards.
B. S.
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

Reply via email to