Re: [OMPI users] Docker Cluster Queue Manager

Rob Nagler Mon, 6 Jun 2016 11:33:16 -0400 (EDT)

Thanks, John. I sometimes wonder if I'm the only one out there with this
particular problem.

Ralph, thanks for sticking with me. :) Using a pool of uids doesn't really
work due to the way cgroups/containers works. It also would require
changing the permissions of all of the user's files, which would create
issues for Jupyter/Hub's access to the files, which is used for in situ
monitoring.

Docker does not yet handle uid mapping at the container level (1.10 added
mappings for the daemon). We have solved this problem
<https://github.com/radiasoft/containers/blob/fc63d3c0d2ffe7e8a80ed1e2a8dc44a33c08cb46/bin/build-docker.sh#L110>
by adding a uid/gid switcher at container startup for our images. The trick
is to change the uid/gid of the "container user" with usermod and groupmod.
This only works, however, with images we provide. I'd like a solution that
allows us to start arbitrary/unsafe images, relying on cgroups to their job.

Gilles, the containers do lock the user down, but the problem is that the
file system space has to be dynamically bound to the containers across the
cluster. JuptyerHub solves this problem by understanding the concept of a
user, and providing a hook to change the directory to be mounted.

Daniel, we've had bad experiences with ZoL. It's allocation algorithm
degrades rapidly when the file system gets over 80% full. It still is not
integrated into major distros, which leads to dkms nightmares on system
upgrades. I don't really see Flocker as helping in this regard, because the
problem is the scheduler, not the file system. We know which directory we
have to mount from the cluster file system, just need to get the scheduler
to allow us to mount that with the container that is running slurmd.

I'll play with Slurm Elastic Compute this week to see how it works.

Rob

Re: [OMPI users] Docker Cluster Queue Manager

Reply via email to