Ah - good to know! I did the mod anyway, so hopefully we'll do better in 1.7.5 
regardless.

Thanks for the update!
Ralph

On Feb 20, 2014, at 8:42 AM, Eric Chamberland 
<eric.chamberl...@giref.ulaval.ca> wrote:

> Hi Ralph,
> 
> some new information about this "bug": we got a defective disk on this 
> computer!  Then filesystem errors occurred...  The disk is now replaced since 
> 2 days and everything seems to work well (the problem re-occurred since the 
> last time I wrote about it).
> 
> Sorry for bothering!
> 
> Eric
> 
> 
> On 02/05/2014 11:38 AM, Ralph Castain wrote:
>> I'm afraid it isn't quite that simple, Jeff. We also have the race condition 
>> at startup - multiple procs on the same machine, from the same job, will be 
>> trying to create the session directory tree. At the moment, we see the fact 
>> that some other proc created it and simply create our own entry underneath 
>> as required. So I don't know how to tell the difference between "some other 
>> proc from my job created it first" vs "this is a stale directory and should 
>> be deleted".
>> 
>> However, I might be able to rig something up when the daemons start, and for 
>> singletons. Will give that a try
>> 
>> On Feb 4, 2014, at 6:11 AM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> 
>> wrote:
>> 
>>> On Feb 3, 2014, at 6:44 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> 
>>>>> If I may suggest to test the behavior of 1.7.x... what about this: Have a 
>>>>> test case that creates a bunch of files (from 0 to 65536) in 
>>>>> /tmp/openmpi-sessions-${USER}... before launching an executable without 
>>>>> mpirun... >:)
>>>> 
>>>> Ick - it will actually only conflict if/when the pid's wrap, so it's a 
>>>> pretty rare issue.
>>> 
>>> 
>>> Ralph: what do you think about modifying this for 1.7.5?  I.e., if the pid 
>>> dir already exists in the session directory, remove it.  This is always 
>>> safe to do (assuming /tmp is a local filesystem) because the OS will never 
>>> use the same PID for 2 concurrent processes.
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 

Reply via email to