That was quick. I will try the patch as soon as you release it. nick
On Wed, Dec 2, 2009 at 21:06, Ralph Castain <r...@open-mpi.org> wrote: > Patch is built and under review... > > Thanks again > Ralph > > On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote: > > Thanks > > On Wed, Dec 2, 2009 at 17:04, Ralph Castain <r...@open-mpi.org> wrote: > >> Yeah, that's the one all right! Definitely missing from 1.3.x. >> >> Thanks - I'll build a patch for the next bug-fix release >> >> >> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote: >> >> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <r...@open-mpi.org> wrote: >> >> Indeed - that is very helpful! Thanks! >> >> Looks like we aren't cleaning up high enough - missing the directory >> level. >> >> I seem to recall seeing that error go by and that someone fixed it on >> our >> >> devel trunk, so this is likely a repair that didn't get moved over to >> the >> >> release branch as it should have done. >> >> I'll look into it and report back. >> > >> > You are probably referring to >> > https://svn.open-mpi.org/trac/ompi/changeset/21498 >> > >> > There was an issue about orte_session_dir_finalize() not >> > cleaning up the session directories properly. >> > >> > Hope that helps. >> > >> > Abhishek >> > >> >> Thanks again >> >> Ralph >> >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote: >> >> >> >> >> >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <r...@open-mpi.org> wrote: >> >>> >> >>> Hmm....if you are willing to keep trying, could you perhaps let it run >> for >> >>> a brief time, ctrl-z it, and then do an ls on a directory from a >> process >> >>> that has already terminated? The pids will be in order, so just look >> for an >> >>> early number (not mpirun or the parent, of course). >> >>> It would help if you could give us the contents of a directory from a >> >>> child process that has terminated - would tell us what subsystem is >> failing >> >>> to properly cleanup. >> >> >> >> Ok, so I Ctrl-Z the master. In >> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only one >> >> directory >> >> >> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 >> >> >> >> I can't find that PID though. mpirun has PID 4230, orted does not >> exist, >> >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it >> again, >> >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, >> there >> >> are 70 sequentially numbered directories starting at 0. Every directory >> >> contains another directory called "0". There is nothing in any of those >> >> directories. I see for instance: >> >> >> >> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70 >> >> total 4.0K >> >> drwx------ 2 nbock users 4.0K Dec 2 14:41 0 >> >> >> >> and >> >> >> >> nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls >> -lh >> >> 70/0/ >> >> total 0 >> >> >> >> I hope this information helps. Did I understand your question >> correctly? >> >> >> >> nick >> >> >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >