I have recently completed a number of performance tests on a Beowulf cluster, using up to 48 dual-core P4D nodes, connected by an Extreme Networks Gigabit edge switch. The tests consist of single and multi-node application benchmarks, including DLPOLY, GROMACS, and VASP, as well as specific tests of network cards and switches. I used TCP sockets with OpenMPI v1.2 and MPI/GAMMA over Gigabit ethernet. MPI/GAMMA leads to significantly better scaling than OpenMPI/TCP in both network tests and in application benchmarks. The overall performance of the MPI/GAMMA cluster on a per cpu basis was found to be comparable to a dual-core Opteron cluster with an Infiniband interconnect. The DLPoly benchmark showed similar scaling to those reported for an IBM p690. The performance using TCP was typically a factor of 2 less in these same tests. A detailed write up can be found at: http://ladd.che.ufl.edu/research/beoclus/beoclus.htm
Tony Ladd Chemical Engineering University of Florida