> -----Original Message----- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > On Behalf Of Dave Love > Sent: Thursday, July 18, 2013 1:22 PM > To: Open MPI Users > Subject: Re: [OMPI users] knem/openmpi performance? > > Paul Kapinos <kapi...@rz.rwth-aachen.de> writes: > > > Jeff, I would turn the question the other way around: > > > > - are there any penalties when using KNEM? > > Bull should be able to comment on that -- they turn it on by default in their > proprietary OMPI derivative -- but I doubt I can get much of a story on it. > Mellanox ship it now too, but I don't know if their distribution defaults to > using it. > > I expect to use knem on hardware that's essentially the same as Mark's. > If any issues appear in production, I'll be surprised and will report them. > > > We have a couple of Really Big Nodes (128 cores) with non-huge memory > > bandwidth (because coupled of 4x standalone nodes with 4 sockets > > each). > > I was hoping to have some results for just such a setup, but haven't been > able to spend any time on it this week. If there are any suggestions for OMPI > tuning on it I'd be interested. >
Detailed results are coming in the near future, but the benchmarks done up to now indicate that collectives that use bulk (non-segmented) transfers, e.g. MPI_Alltoall with large chunks, receive quite a decent speed bump with KNEM transfers - e.g. 1.5x speed-up for 128 processes and 4 MiB data chunks - while those that use pipelines, e.g. MPI_Bcast with large messages and many processes, suffer big time since the default algorithm selection heuristics are inadequate - e.g. an 8 MiB message is pipelined to 127 other processes using segment size of 8 KiB and with KNEM it takes forever = more than 10x longer than with the user-space double-copy method - and therefore one has to override the heuristics by providing a proper set of dynamic rules in a largely undocumented file format. > > So cutting the bandwidth in halves on these nodes sound like Very Good > > Thing. > > > > But otherwise we've 1500+ nodes with 2 sockets and 24GB memory only > > and we do not wanna to disturb the production on these nodes.... (and > > different MPI versions for different nodes are doofy). > > Why would you need that? Our horribly heterogeneous cluster just has a > node group-specific openmpi-mca-params.conf, and SGE parallel > environments keep jobs in specific host groups with basically the same CPU > speed and interconnect. > MPI_Alltoall(v) with large chunks seems to benefit on those machines too. And we have a number of applications that perform lots of single-node all-to-all operations. > > > > Best > > > > Paul > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users Regards, Hristo -- Hristo Iliev, PhD - High Performance Computing Team RWTH Aachen University, Center for Computing and Communication Rechen- und Kommunikationszentrum der RWTH Aachen Seffenter Weg 23, D 52074 Aachen (Germany)
smime.p7s
Description: S/MIME cryptographic signature