Hi.
 
I have recently built a cluster upon a Dell PowerEdge Server with a Debian 6.0 
OS. This server is composed of 
4 system board of 2 processors of hexacores. So it gives 12 cores per system 
board.
The boards are linked with a local Gbits switch. 
 
In order to parallelize the software Code Saturne, which is a CFD solver, I 
have configured the cluster
such that there are a pbs server/mom on 1 system board and 3 mom and the 3 
others cards. So this leads to 
48 cores dispatched on 4 nodes of 12 CPU. Code saturne is compiled with the 
openmpi 1.6 version.
 
When I launch a simulation using 2 nodes with 12 cores, elapse time is good and 
network traffic is not full.
But when I launch the same simulation using 3 nodes with 8 cores, elapse time 
is 5 times the previous one.
I both cases, I use 24 cores and network seems not to be satured. 
 
I have tested several configurations : binaries in local file system or on a 
NFS. But results are the same.
I have visited severals forums (in particular 
http://www.open-mpi.org/community/lists/users/2009/08/10394.php)
and read lots of threads, but as I am not an expert at clusters, I presently do 
not see where it is wrong !
 
Is it a problem in the configuration of PBS (I have installed it from the deb 
packages), a subtile compilation options
of openMPI, or a bad network configuration ?
 
Regards.
 
B. S.

Reply via email to