Am 03.02.2014 um 23:01 schrieb Eric Chamberland: > Hi Ralph, > > On 02/03/2014 04:20 PM, Ralph Castain wrote: >> On Feb 3, 2014, at 1:13 PM, Eric Chamberland >> <eric.chamberl...@giref.ulaval.ca> wrote: >> >>> On 02/03/2014 03:59 PM, Ralph Castain wrote: >>>> Very strange - even if you kill the job with SIGTERM, or have processes >>>> that segfault, OMPI should clean itself up and remove those session >>>> directories. Granted, the 1.6 series isn't as good about doing so as the >>>> 1.7 series, but it at least to-date has done pretty well. >>> Ok, one more information here that may matter: All sequential tests are >>> launched *without* mpiexec... I don't know if the "cleanup" phase is done >>> by mpiexec or the binaries... >> Ah, yes that would be a source of the problem! We can't guarantee cleanup if >> you just kill the procs or they segfault *unless* mpiexec is used to launch >> the job. What are you using to launch? Most resource managers provide an >> "epilog" capability for precisely this purpose as all MPIs would display the >> same issue. > For the sequential jobs, we just launch the tests on the "command line"... no > resource manager is ever used. For the jobs which requires more than 1 > process, we have "mpiexec -n ..." added to the command line... > >>> which should delete files that shouldn't exists... ;-) >>> >>> But, IMHO, I still think OpenMPI should "choose" another directory name if >>> it can't create it because a poor file exists! >> We could do that - but now we get into the bottomless pit of trying every >> possible combination of directory names, and ensuring that every process >> comes up with the same answer! Remember, the session dir is where the shared >> memory regions rendezvous, so every process on a node would have to find the >> same place > ok. Just for my knowledge: that means if I launch 2 processes on a single > node and they have to communicate, they will do it by the files in /tmp? > >>> How can all users be aware that they have to cleanup such files? >> Given how long 1.6.x has been out there, and that this is about the only >> time I've heard of a problem, I'm not sure this is a general enough issue to >> merit the concern > Ok. I did just verified on 8 other computers/architectures that are running > the same tests: there is only 1 which have files in the directory level of > /tmp/openmpi-sessions-${USER}* > Since we do that kind of testing since many years, I also agree it is not a > widespread issue... But it just occured 2 times in the last 3 days!!! :-/
What about using a queuing system? Open MPI will put the created files into a subdirectory dedicated for this job by the queuing system. Even if Open MPI fails to remove the files, the queuing system will do. -- Reuti >> >>> Maybe a good compromise would be to have the error message to tell there is >>> a file with the same name of the directory chosen? >> I can make that change - good suggestion. > ok, thanks! > >> >>> Or add a new entry to the FAQ to help users find the workaround you >>> proposed... ;-) >> we can try to do that too > > If I may suggest to test the behavior of 1.7.x... what about this: Have a > test case that creates a bunch of files (from 0 to 65536) in > /tmp/openmpi-sessions-${USER}... before launching an executable without > mpirun... >:) > > Anyway, thanks a lot! > > Eric > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users