Ah - good to know! I did the mod anyway, so hopefully we'll do better in 1.7.5 regardless.
Thanks for the update! Ralph On Feb 20, 2014, at 8:42 AM, Eric Chamberland <eric.chamberl...@giref.ulaval.ca> wrote: > Hi Ralph, > > some new information about this "bug": we got a defective disk on this > computer! Then filesystem errors occurred... The disk is now replaced since > 2 days and everything seems to work well (the problem re-occurred since the > last time I wrote about it). > > Sorry for bothering! > > Eric > > > On 02/05/2014 11:38 AM, Ralph Castain wrote: >> I'm afraid it isn't quite that simple, Jeff. We also have the race condition >> at startup - multiple procs on the same machine, from the same job, will be >> trying to create the session directory tree. At the moment, we see the fact >> that some other proc created it and simply create our own entry underneath >> as required. So I don't know how to tell the difference between "some other >> proc from my job created it first" vs "this is a stale directory and should >> be deleted". >> >> However, I might be able to rig something up when the daemons start, and for >> singletons. Will give that a try >> >> On Feb 4, 2014, at 6:11 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >> wrote: >> >>> On Feb 3, 2014, at 6:44 PM, Ralph Castain <r...@open-mpi.org> wrote: >>> >>>>> If I may suggest to test the behavior of 1.7.x... what about this: Have a >>>>> test case that creates a bunch of files (from 0 to 65536) in >>>>> /tmp/openmpi-sessions-${USER}... before launching an executable without >>>>> mpirun... >:) >>>> >>>> Ick - it will actually only conflict if/when the pid's wrap, so it's a >>>> pretty rare issue. >>> >>> >>> Ralph: what do you think about modifying this for 1.7.5? I.e., if the pid >>> dir already exists in the session directory, remove it. This is always >>> safe to do (assuming /tmp is a local filesystem) because the OS will never >>> use the same PID for 2 concurrent processes. >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >