Brian,

Some of these files are used for startup, while others are used during application execution (such as the back-end for shared memory files). Over the years we had a lot of discussions about this topic, and so far we have two ways to help people deal with such situations. However, from my personal experience I don't think that mounting the / tmp over any kind of shared filesystem is a good idea. Anyway, here are two MCA parameters that might help you:

MCA orte: parameter "orte_tmpdir_base" (current value: <none>, data source: default value)
                          Base of the session directory tree
MCA orte: parameter "orte_no_session_dirs" (current value: <none>, data source: default value) Prohibited locations for session directories (multiple locations separated by ',', default=NULL)

I suggest using the first one as a startup.

  george.

On Aug 16, 2008, at 9:40 PM, Brian Dobbins wrote:

Hi guys,

I was hoping someone here could shed some light on OpenMPI's use of /tmp (or, I guess, TMPDIR) and save me from diving into the source.. ;)

The background is that I'm trying to run some applications on a system which has a flaky parallel file system which TMPDIR is mapped to - so, on start up, OpenMPI creates it's 'openmpi-sessions-<user>' directory there, and under that, a few files. Sometimes I see 1 subdirectory (usually a 0), sometimes a 0 and a 1, etc. In one of these, sometimes I see files such as 'shared_memory_pool.<host>', and 'shared_memory_module.<host>'.

My questions are, one, what are the various numbers / files for? (If there's a write-up on this somewhere, just point me towards it!)

And two, the real question, are these 'files' used during runtime, or only upon startup / shutdown? I'm having issues with various codes, especially those heavy on messages and I/O, failing to complete a run, and haven't resorted to sifting through strace's output yet. This doesn't happen all the time, but I've seen it happen reliably now with one particular code - it's success rate (it DOES succeed sometimes) is about 25% right now. My best guess is that this is because the file system is overloaded, thus not allowing timely I/O or access to OpenMPI's files, but I wanted to get a quick understanding of how these files are used by OpenMPI and whether the FS does indeed seem a likely culprit before going with that theory for sure.

  Thanks very much,
  - Brian


Brian Dobbins
Yale Engineering HPC
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to