Your problem may not be related to bandwidth. It may be latency or
division of the problem. We found significant improvements running wrf
and other atmospheric code (CFD) over IB. The problem was not so much
the amount of data communicated, but how long it takes to send it. Also,
is your model big enough to split up as much as you are trying? If there
is not enough work for each core to do between edge exchanges, you will
spend all of your time spinning waiting for the network. If you are
running a demo benchmark it is likely too small for the number of
processors. At least that is what we find with most weather model demo
problems. One other thing to look at is how it is being split up.
Depending on what the algorithm does, you may want more x points, more y
points or completely even divisions. We found that we can significantly
speed up wrf for our particular domain by not lett
On 07/10/12 08:48, Dugenoux Albert wrote:
Thanks for your answer.You are right.
I've tried upon 4 nodes with 6 processes and things are worst.
So do you suggest that unique thing to do is to order an infiniband
switch or is there a possibility to enhance
something by tuning mca parameters ?
*De :* Ralph Castain <r...@open-mpi.org>
*À :* Dugenoux Albert <dugeno...@yahoo.fr>; Open MPI Users
<us...@open-mpi.org>
*Envoyé le :* Mardi 10 juillet 2012 16h47
*Objet :* Re: [OMPI users] Bad parallel scaling using Code Saturne
with openmpi
I suspect it mostly reflects communication patterns. I don't know
anything about Saturne, but shared memory is a great deal faster than
TCP, so the more processes sharing a node the better. You may also be
hitting some natural boundary in your model - perhaps with 8
processes/node you wind up with more processes that cross the node
boundary, further increasing the communication requirement.
Do things continue to get worse if you use all 4 nodes with 6
processes/node?
On Jul 10, 2012, at 7:31 AM, Dugenoux Albert wrote:
Hi.
I have recently built a cluster upon a Dell PowerEdge Server with a
Debian 6.0 OS. This server is composed of
4 system board of 2 processors of hexacores. So it gives 12 cores per
system board.
The boards are linked with a local Gbits switch.
In order to parallelize the software Code Saturne, which is a CFD
solver, I have configured the cluster
such that there are a pbs server/mom on 1 system board and 3 mom and
the 3 others cards. So this leads to
48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiled
with the openmpi 1.6 version.
When I launch a simulation using 2 nodes with 12 cores, elapse time
is good and network traffic is not full.
But when I launch the same simulation using 3 nodes with 8 cores,
elapse time is 5 times the previous one.
I both cases, I use 24 cores and network seems not to be satured.
I have tested several configurations : binaries in local file system
or on a NFS. But results are the same.
I have visited severals forums (in particular
http://www.open-mpi.org/community/lists/users/2009/08/10394.php)
and read lots of threads, but as I am not an expert at clusters, I
presently do not see where it is wrong !
Is it a problem in the configuration of PBS (I have installed it from
the deb packages), a subtile compilation options
of openMPI, or a bad network configuration ?
Regards.
B. S.
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users