Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

Dugenoux Albert Wed, 11 Jul 2012 16:21:55 -0400

Hi.

To answer the differents remarks :


1) Code Saturne launch itself embedded python and bash scripts with the mpiexec 
parameters, but I will test
your parameter next week and will give you the result of this benchmark.


2) I do not think there is a problem with the load balancing : Code Saturne 
partitions itself
the mesh with the reliable and well-known Metis library which is the graph 
partitioner. So CPU
are equally busy. 


3) CPUs are Xeon which have multithreading capabilities. However I have tested 
it
by setting np=24 in the server_priv/nodes file of the PBS server, and compared 
that
with a configuration of np=12. The results are very similar : there is no gain 
of 20% or 30%

4) I will examine the hardware options as you have suggested but I will have to 
convince my
office for such investissment !


________________________________
 De : Gus Correa <g...@ldeo.columbia.edu>
À : Open MPI Users <us...@open-mpi.org> 
Envoyé le : Mercredi 11 juillet 2012 0h51
Objet : Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi
 
On 07/10/2012 05:31 PM, Jeff Squyres wrote:
> +1.  Also, not all Ethernet switches are created equal --
> particularly commodity 1GB Ethernet switches.
> I've seen plenty of crappy Ethernet switches rated for 1GB
> that could not reach that speed when under load.
>

Are you perhaps belittling my dear $43 [brand undisclosed]
5-port GigE SoHo switch, that connects my Pentium-III
toy cluster, just because it drops a few packages [per microsec]?
It looks so good, with all those fiercely blinking green LEDs.
Where else could I fool around with cluster setup and test
the OpenMPI new releases? :)
The production cluster is just too crowded for this,
maybe because it has a decent
HP GigE switch [IO] and Infiniband [MPI] ...

Gus


>
>
> On Jul 10, 2012, at 10:47 AM, Ralph Castain wrote:
>
>> I suspect it mostly reflects communication patterns. I don't know anything 
>> about Saturne, but shared memory is a great deal faster than TCP, so the 
>> more processes sharing a node the better. You may also be hitting some 
>> natural boundary in your model - perhaps with 8 processes/node you wind up 
>> with more processes that cross the node boundary, further increasing the 
>> communication requirement.
>>
>> Do things continue to get worse if you use all 4 nodes with 6 processes/node?
>>
>>
>> On Jul 10, 2012, at 7:31 AM, Dugenoux Albert wrote:
>>
>>> Hi.
>>>
>>> I have recently built a cluster upon a Dell PowerEdge Server with a Debian 
>>> 6.0 OS. This server is composed of
>>> 4 system board of 2 processors of hexacores. So it gives 12 cores per 
>>> system board.
>>> The boards are linked with a local Gbits switch.
>>>
>>> In order to parallelize the software Code Saturne, which is a CFD solver, I 
>>> have configured the cluster
>>> such that there are a pbs server/mom on 1 system board and 3 mom and the 3 
>>> others cards. So this leads to
>>> 48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiled with the 
>>> openmpi 1.6 version.
>>>
>>> When I launch a simulation using 2 nodes with 12 cores, elapse time is good 
>>> and network traffic is not full.
>>> But when I launch the same simulation using 3 nodes with 8 cores, elapse 
>>> time is 5 times the previous one.
>>> I both cases, I use 24 cores and network seems not to be satured.
>>>
>>> I have tested several configurations : binaries in local file system or on 
>>> a NFS. But results are the same.
>>> I have visited severals forums (in particular 
>>> http://www.open-mpi.org/community/lists/users/2009/08/10394.php)
>>> and read lots of threads, but as I am not an expert at clusters, I 
>>> presently do not see where it is wrong !
>>>
>>> Is it a problem in the configuration of PBS (I have installed it from the 
>>> deb packages), a subtile compilation options
>>> of openMPI, or a bad network configuration ?
>>>
>>> Regards.
>>>
>>> B. S.
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Bad parallel scaling using Code Saturne with openmpi

Reply via email to