Am 03.02.2014 um 23:01 schrieb Eric Chamberland:
> Hi Ralph,
>
> On 02/03/2014 04:20 PM, Ralph Castain wrote:
>> On Feb 3, 2014, at 1:13 PM, Eric Chamberland
>> <[email protected]> wrote:
>>
>>> On 02/03/2014 03:59 PM, Ralph Castain wrote:
>>>> Very strange - even if you kill the job with SIGTERM, or have processes
>>>> that segfault, OMPI should clean itself up and remove those session
>>>> directories. Granted, the 1.6 series isn't as good about doing so as the
>>>> 1.7 series, but it at least to-date has done pretty well.
>>> Ok, one more information here that may matter: All sequential tests are
>>> launched *without* mpiexec... I don't know if the "cleanup" phase is done
>>> by mpiexec or the binaries...
>> Ah, yes that would be a source of the problem! We can't guarantee cleanup if
>> you just kill the procs or they segfault *unless* mpiexec is used to launch
>> the job. What are you using to launch? Most resource managers provide an
>> "epilog" capability for precisely this purpose as all MPIs would display the
>> same issue.
> For the sequential jobs, we just launch the tests on the "command line"... no
> resource manager is ever used. For the jobs which requires more than 1
> process, we have "mpiexec -n ..." added to the command line...
>
>>> which should delete files that shouldn't exists... ;-)
>>>
>>> But, IMHO, I still think OpenMPI should "choose" another directory name if
>>> it can't create it because a poor file exists!
>> We could do that - but now we get into the bottomless pit of trying every
>> possible combination of directory names, and ensuring that every process
>> comes up with the same answer! Remember, the session dir is where the shared
>> memory regions rendezvous, so every process on a node would have to find the
>> same place
> ok. Just for my knowledge: that means if I launch 2 processes on a single
> node and they have to communicate, they will do it by the files in /tmp?
>
>>> How can all users be aware that they have to cleanup such files?
>> Given how long 1.6.x has been out there, and that this is about the only
>> time I've heard of a problem, I'm not sure this is a general enough issue to
>> merit the concern
> Ok. I did just verified on 8 other computers/architectures that are running
> the same tests: there is only 1 which have files in the directory level of
> /tmp/openmpi-sessions-${USER}*
> Since we do that kind of testing since many years, I also agree it is not a
> widespread issue... But it just occured 2 times in the last 3 days!!! :-/
What about using a queuing system? Open MPI will put the created files into a
subdirectory dedicated for this job by the queuing system. Even if Open MPI
fails to remove the files, the queuing system will do.
-- Reuti
>>
>>> Maybe a good compromise would be to have the error message to tell there is
>>> a file with the same name of the directory chosen?
>> I can make that change - good suggestion.
> ok, thanks!
>
>>
>>> Or add a new entry to the FAQ to help users find the workaround you
>>> proposed... ;-)
>> we can try to do that too
>
> If I may suggest to test the behavior of 1.7.x... what about this: Have a
> test case that creates a bunch of files (from 0 to 65536) in
> /tmp/openmpi-sessions-${USER}... before launching an executable without
> mpirun... >:)
>
> Anyway, thanks a lot!
>
> Eric
>
> _______________________________________________
> users mailing list
> [email protected]
> http://www.open-mpi.org/mailman/listinfo.cgi/users