That was quick. I will try the patch as soon as you release it.

nick


On Wed, Dec 2, 2009 at 21:06, Ralph Castain <r...@open-mpi.org> wrote:

> Patch is built and under review...
>
> Thanks again
> Ralph
>
> On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote:
>
> Thanks
>
> On Wed, Dec 2, 2009 at 17:04, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Yeah, that's the one all right! Definitely missing from 1.3.x.
>>
>> Thanks - I'll build a patch for the next bug-fix release
>>
>>
>> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:
>>
>> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <r...@open-mpi.org> wrote:
>> >> Indeed - that is very helpful! Thanks!
>> >> Looks like we aren't cleaning up high enough - missing the directory
>> level.
>> >> I seem to recall seeing that error go by and that someone fixed it on
>> our
>> >> devel trunk, so this is likely a repair that didn't get moved over to
>> the
>> >> release branch as it should have done.
>> >> I'll look into it and report back.
>> >
>> > You are probably referring to
>> > https://svn.open-mpi.org/trac/ompi/changeset/21498
>> >
>> > There was an issue about orte_session_dir_finalize() not
>> > cleaning up the session directories properly.
>> >
>> > Hope that helps.
>> >
>> > Abhishek
>> >
>> >> Thanks again
>> >> Ralph
>> >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote:
>> >>
>> >>
>> >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <r...@open-mpi.org> wrote:
>> >>>
>> >>> Hmm....if you are willing to keep trying, could you perhaps let it run
>> for
>> >>> a brief time, ctrl-z it, and then do an ls on a directory from a
>> process
>> >>> that has already terminated? The pids will be in order, so just look
>> for an
>> >>> early number (not mpirun or the parent, of course).
>> >>> It would help if you could give us the contents of a directory from a
>> >>> child process that has terminated - would tell us what subsystem is
>> failing
>> >>> to properly cleanup.
>> >>
>> >> Ok, so I Ctrl-Z the master. In
>> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only one
>> >> directory
>> >>
>> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857
>> >>
>> >> I can't find that PID though. mpirun has PID 4230, orted does not
>> exist,
>> >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it
>> again,
>> >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68,
>> there
>> >> are 70 sequentially numbered directories starting at 0. Every directory
>> >> contains another directory called "0". There is nothing in any of those
>> >> directories. I see for instance:
>> >>
>> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70
>> >> total 4.0K
>> >> drwx------ 2 nbock users 4.0K Dec  2 14:41 0
>> >>
>> >> and
>> >>
>> >> nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls
>> -lh
>> >> 70/0/
>> >> total 0
>> >>
>> >> I hope this information helps. Did I understand your question
>> correctly?
>> >>
>> >> nick
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to