Hi Janne,
On Tue, Feb 26, 2019 at 3:56 PM Janne Blomqvist <janne.blomqv...@aalto.fi> wrote: > When reaping, it searches for these special .datasync directories (up to > a configurable recursion depth, say 2 by default), and based on the > LAST_SYNCED timestamps, deletes entire datasets starting with the oldest > LAST_SYNCED, until the policy goal has been met. Directory trees without > .datasync directories are deleted first. .datasync/SLURM_JOB_IDS is used > as an extra safety check to not delete a dataset used by a running job. > > But nothing concrete done yet. Anyway, I'm open to suggestions about > better ideas, or existing tools that already solve this problem. Interesting idea! As I mentioned earlier, I perform data set copying manually as the system administrators (in our case) aren't responsible for this. It would be nice if they did something like this for us users. I was wondering if SLURM could be configured in such a way to help this along. For example, if there are 12 nodes and 3 research groups, can one configure it so that a job by research group A is allocated to a node that has its data already there. I guess it would be like the local data is a "resource" and each node either has that resource or not...with it dynamically changing. As I only have a limited knowledge of system administrator (I do co-administer a much smaller cluster that doesn't have this problem), I wonder if something like this is possible. If so, some profiling with a real set of users as guinea pigs :-) would be interesting. As in whether it actually gives noticeable benefits to users. Ray