We talked about this issue on the weekly OMPI engineering teleconf today. It seems like it would be a good idea to bring over the new shared memory revamp to the v1.5 series before it transitions to v1.6 so that it can avoid network-mounted /tmp filesystem issues. LANL will be evaluating this; the gut feeling was that it would not be a lot of work to bring this over to the v1.5 branch.
I've created https://svn.open-mpi.org/trac/ompi/ticket/2908 to track the issue. On Nov 8, 2011, at 8:21 AM, Jeff Squyres wrote: > On Nov 7, 2011, at 12:12 PM, Blosch, Edwin L wrote: > >> Thanks for the valuable input. I'll change to a wait-and-watch approach. >> >> The FAQ on tuning sm says "If the session directory is located on a network >> filesystem, the shared memory BTL latency will be extremely high." And the >> title is 'Why am I seeing incredibly poor performance...'. So I made the >> leap that this configuration must be avoided at all costs... > > (sorry for jumping in late; it's the week before SC, and lots of deadlines > are approaching!) > > This is definitely true: if OMPI's mmap files are located on a network > filesystem (such as if /tmp is NFS-mounted), your latencies will be higher. > I don't claim to know all the exact reasons why, but I have personally seen > enough empirical evidence to believe it. Perhaps newer versions of > Linux/NFS/whatever have made the issue better. But I'm quite sure that it > was happening; that's why we put in that warning. > > Here's a few points to add to this discussion, in no particular order: > > 1. Keep in mind the difference between the session directory and the shared > memory backing files: the session directory contains some meta data that OMPI > processes need. In general, most of that data is not performance-critical, > such that if it's on a networked filesystem, general MPI performance will not > be affected. In 1.4.x and 1.5.x, the shared memory mmap files are also > located in the session directory, and as described above, we have definitely > seen a negative MPI latency performance impact when this file is on a > networked file system. > > 2. In the upcoming OMPI v1.7, we revamped the shared memory backing system > such that mmap does not have to be used, and therefore will not care if /tmp > is on a networked filesystem. > > 3. I don't know whether /tmp on an networked filesystem is 100% "proper" or > not. I know that some people do it, but there are uniqueness requirements > that can definitely be violated in various other tools in this case. OMPI > may not be the only software package that can run into problems here, even if > the problems are rare and difficult to track down (e.g., because two > processes with the same PID on different machines tried to use the same > filename in /tmp, or attempts to use file locking, etc.). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/