Gbit Ethernet is well known to perform poorly for fine grained code like VASP. The latencies for Gbit Ethernet are much too high.
If you want good scaling in a cluster for VASP, you'll need to run InfiniBand or some other high speed/ low latency network. Jim -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Monday, August 17, 2009 9:24 PM To: Open MPI Users Cc: David Hibbitts Subject: Re: [OMPI users] very bad parallel scaling of vasp using openmpi You might want to run some performance testing of you TCP stacks and the switch -- use a non-MPI application such as NetPIPE (or others -- google around) and see what kind of throughput you get. Try it between individual server peers and then try running it simultaneously between a bunch of peers and see if the results are different, etc. On Aug 17, 2009, at 5:51 PM, Craig Plaisance wrote: > Hi - I have compiled vasp 4.6.34 using the Intel fortran compiler 11.1 > with openmpi 1.3.3 on a cluster of 104 nodes running Rocks 5.2 with > two > quad core opterons connected by a Gbit ethernet. Running in > parallel on > one node (8 cores) runs very well, faster than any other cluster I > have > run it on. However, running on 2 nodes in parallel only improves the > performance by 10% over the one node case while running on 4 and 8 > nodes > yields no improvement over the two node case. Furthermore, when > running > multiple (3-4) jobs simultaneously, the performance decreases by > around > 50% compared to running only a single job on the entire cluster. The > nodes are connected by a Dell Powerconnect 6248 managed switch. I get > the same performance with mpich2, so I don't think it is a problem > specific to openmpi. Other vasp users have reported very good scaling > up to 4 nodes on a similar cluster, so I don't think the problem is > vasp > either. Could something be wrong with the way mpi is configured to > work > with the switch? Or the operating system is not configured to work > with > the switch properly? Or the switch itself needs to be configured? > Thanks! > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Jeff Squyres jsquy...@cisco.com _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users