On Mar 1, 2010, at 10:04 AM, David Turner wrote:

> Hi Ralph,
> 
>> Which version of OMPI are you using? We know that the 1.2 series was 
>> unreliable about removing the session directories, but 1.3 and above appear 
>> to be quite good about it. If you are having problems with the 1.3 or 1.4 
>> series, I would definitely like to know about it.
>> When I was at LANL, I ran a number of tests in exactly this configuration. 
>> While the sm btl did provide some performance advantage, it wasn't very much 
>> (the bandwidth was only about 10% greater, and the latency wasn't all that 
>> different either). I set the default configuration for users to include sm 
>> as 10% isn't something to sneer at, but you could disable it without an 
>> enormous impact.
> 
> I realize I have another question about this.  When you say "exactly"
> this configuration, do you mean the mmap files were backed to /tmp
> via ramdisk, or to a remote file system over the communications fabric?

Backed to /tmp via ramdisk

> 
> We have historically redefined TMPDIR to point somewhere other than
> /tmp, and have told our users *never* to use /tmp (if possible).
> I suppose that if OMPI cleans up after itself, and we use a
> prologue/epilogue, and regular scrubbing, we can keep /tmp under
> control.

That's what LANL does...i.e., OMPI cleanup + epilogue

> 
>> Another option would be to run an epilog that hammers the session directory. 
>> That's what LANL does, even though we didn't see much trouble with cleanup 
>> starting with the 1.3 series (still have a bunch of users stuck on 1.2). 
>> Depending on what environment you are running, you might contact folks there 
>> and get a copy of their epilog script.
>> On Mar 1, 2010, at 1:42 AM, David Turner wrote:
>>> Hi all,
>>> 
>>> Running on a large cluster of 8-core nodes.  I understand
>>> that the SM BTL is a "good thing".  But I'm curious about
>>> its use of memory-mapped files.  I believe these files will
>>> be in $TMPDIR, which defaults to /tmp.
>>> 
>>> In our cluster, the compute nodes are stateless, so /tmp
>>> is actually in RAM.  Keeping memory-mapped "files" in
>>> memory seems kind of circular, although I know little
>>> about these things.  A bigger problem is that it appears
>>> OMPI does not remove the files upon completion.
>>> 
>>> Another option is to redefine $TMPDIR to point to a
>>> "real" file system.  In our cluster, all the available
>>> file systems are accessed over the IB fabric.  So it
>>> seems that there will be IB traffic, even though the
>>> point of the SM BTL is to avoid this traffic.
>>> 
>>> Given the above two constraints, might it just be
>>> better to disable the SM BTL entirely, and use the
>>> IB BTL even within a node?  Of course, the "self"
>>> BTL should still be used if appropriate.
>>> 
>>> Any thoughts clarifying these issues would be
>>> greatly appreciated.  Thanks!
>>> 
>>> -- 
>>> Best regards,
>>> 
>>> David Turner
>>> User Services Group        email: dptur...@lbl.gov
>>> NERSC Division             phone: (510) 486-4027
>>> Lawrence Berkeley Lab        fax: (510) 486-4316
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Best regards,
> 
> David Turner
> User Services Group        email: dptur...@lbl.gov
> NERSC Division             phone: (510) 486-4027
> Lawrence Berkeley Lab        fax: (510) 486-4316
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to