Gbit Ethernet is well known to perform poorly for fine grained code like
VASP.  The latencies for Gbit Ethernet are much too high.

If you want good scaling in a cluster for VASP, you'll need to run
InfiniBand or some other high speed/ low latency network.

Jim

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: Monday, August 17, 2009 9:24 PM
To: Open MPI Users
Cc: David Hibbitts
Subject: Re: [OMPI users] very bad parallel scaling of vasp using openmpi

You might want to run some performance testing of you TCP stacks and  
the switch -- use a non-MPI application such as NetPIPE (or others --  
google around) and see what kind of throughput you get.  Try it  
between individual server peers and then try running it simultaneously  
between a bunch of peers and see if the results are different, etc.

On Aug 17, 2009, at 5:51 PM, Craig Plaisance wrote:

> Hi - I have compiled vasp 4.6.34 using the Intel fortran compiler 11.1
> with openmpi 1.3.3 on a cluster of 104 nodes running Rocks 5.2 with  
> two
> quad core opterons connected by a Gbit ethernet.  Running in  
> parallel on
> one node (8 cores) runs very well, faster than any other cluster I  
> have
> run it on.  However, running on 2 nodes in parallel only improves the
> performance by 10% over the one node case while running on 4 and 8  
> nodes
> yields no improvement over the two node case.  Furthermore, when  
> running
> multiple (3-4) jobs simultaneously, the performance decreases by  
> around
> 50% compared to running only a single job on the entire cluster.  The
> nodes are connected by a Dell Powerconnect 6248 managed switch.  I get
> the same performance with mpich2, so I don't think it is a problem
> specific to openmpi.  Other vasp users have reported very good scaling
> up to 4 nodes on a similar cluster, so I don't think the problem is  
> vasp
> either.  Could something be wrong with the way mpi is configured to  
> work
> with the switch?  Or the operating system is not configured to work  
> with
> the switch properly?  Or the switch itself needs to be configured?   
> Thanks!
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to