Patch is built and under review... Thanks again Ralph
On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote: > Thanks > > On Wed, Dec 2, 2009 at 17:04, Ralph Castain <r...@open-mpi.org> wrote: > Yeah, that's the one all right! Definitely missing from 1.3.x. > > Thanks - I'll build a patch for the next bug-fix release > > > On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote: > > > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Indeed - that is very helpful! Thanks! > >> Looks like we aren't cleaning up high enough - missing the directory level. > >> I seem to recall seeing that error go by and that someone fixed it on our > >> devel trunk, so this is likely a repair that didn't get moved over to the > >> release branch as it should have done. > >> I'll look into it and report back. > > > > You are probably referring to > > https://svn.open-mpi.org/trac/ompi/changeset/21498 > > > > There was an issue about orte_session_dir_finalize() not > > cleaning up the session directories properly. > > > > Hope that helps. > > > > Abhishek > > > >> Thanks again > >> Ralph > >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote: > >> > >> > >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <r...@open-mpi.org> wrote: > >>> > >>> Hmm....if you are willing to keep trying, could you perhaps let it run for > >>> a brief time, ctrl-z it, and then do an ls on a directory from a process > >>> that has already terminated? The pids will be in order, so just look for > >>> an > >>> early number (not mpirun or the parent, of course). > >>> It would help if you could give us the contents of a directory from a > >>> child process that has terminated - would tell us what subsystem is > >>> failing > >>> to properly cleanup. > >> > >> Ok, so I Ctrl-Z the master. In > >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only one > >> directory > >> > >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 > >> > >> I can't find that PID though. mpirun has PID 4230, orted does not exist, > >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it again, > >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, there > >> are 70 sequentially numbered directories starting at 0. Every directory > >> contains another directory called "0". There is nothing in any of those > >> directories. I see for instance: > >> > >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70 > >> total 4.0K > >> drwx------ 2 nbock users 4.0K Dec 2 14:41 0 > >> > >> and > >> > >> nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh > >> 70/0/ > >> total 0 > >> > >> I hope this information helps. Did I understand your question correctly? > >> > >> nick > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users