On Mar 1, 2010, at 10:04 AM, David Turner wrote: > Hi Ralph, > >> Which version of OMPI are you using? We know that the 1.2 series was >> unreliable about removing the session directories, but 1.3 and above appear >> to be quite good about it. If you are having problems with the 1.3 or 1.4 >> series, I would definitely like to know about it. >> When I was at LANL, I ran a number of tests in exactly this configuration. >> While the sm btl did provide some performance advantage, it wasn't very much >> (the bandwidth was only about 10% greater, and the latency wasn't all that >> different either). I set the default configuration for users to include sm >> as 10% isn't something to sneer at, but you could disable it without an >> enormous impact. > > I realize I have another question about this. When you say "exactly" > this configuration, do you mean the mmap files were backed to /tmp > via ramdisk, or to a remote file system over the communications fabric?
Backed to /tmp via ramdisk > > We have historically redefined TMPDIR to point somewhere other than > /tmp, and have told our users *never* to use /tmp (if possible). > I suppose that if OMPI cleans up after itself, and we use a > prologue/epilogue, and regular scrubbing, we can keep /tmp under > control. That's what LANL does...i.e., OMPI cleanup + epilogue > >> Another option would be to run an epilog that hammers the session directory. >> That's what LANL does, even though we didn't see much trouble with cleanup >> starting with the 1.3 series (still have a bunch of users stuck on 1.2). >> Depending on what environment you are running, you might contact folks there >> and get a copy of their epilog script. >> On Mar 1, 2010, at 1:42 AM, David Turner wrote: >>> Hi all, >>> >>> Running on a large cluster of 8-core nodes. I understand >>> that the SM BTL is a "good thing". But I'm curious about >>> its use of memory-mapped files. I believe these files will >>> be in $TMPDIR, which defaults to /tmp. >>> >>> In our cluster, the compute nodes are stateless, so /tmp >>> is actually in RAM. Keeping memory-mapped "files" in >>> memory seems kind of circular, although I know little >>> about these things. A bigger problem is that it appears >>> OMPI does not remove the files upon completion. >>> >>> Another option is to redefine $TMPDIR to point to a >>> "real" file system. In our cluster, all the available >>> file systems are accessed over the IB fabric. So it >>> seems that there will be IB traffic, even though the >>> point of the SM BTL is to avoid this traffic. >>> >>> Given the above two constraints, might it just be >>> better to disable the SM BTL entirely, and use the >>> IB BTL even within a node? Of course, the "self" >>> BTL should still be used if appropriate. >>> >>> Any thoughts clarifying these issues would be >>> greatly appreciated. Thanks! >>> >>> -- >>> Best regards, >>> >>> David Turner >>> User Services Group email: dptur...@lbl.gov >>> NERSC Division phone: (510) 486-4027 >>> Lawrence Berkeley Lab fax: (510) 486-4316 >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Best regards, > > David Turner > User Services Group email: dptur...@lbl.gov > NERSC Division phone: (510) 486-4027 > Lawrence Berkeley Lab fax: (510) 486-4316 > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users