Re: [OMPI users] Docker Cluster Queue Manager

Daniel Letai Tue, 7 Jun 2016 10:59:41 -0400 (EDT)

On 06/06/2016 06:32 PM, Rob Nagler wrote:

Thanks, John. I sometimes wonder if I'm the only one out there with this particular problem.

Ralph, thanks for sticking with me. :) Using a pool of uids doesn't really work due to the way cgroups/containers works. It also would require changing the permissions of all of the user's files, which would create issues for Jupyter/Hub's access to the files, which is used for in situ monitoring.

Docker does not yet handle uid mapping at the container level (1.10 added mappings for the daemon). We have solved this problem by adding a uid/gid switcher at container startup for our images. The trick is to change the uid/gid of the "container user" with usermod and groupmod. This only works, however, with images we provide. I'd like a solution that allows us to start arbitrary/unsafe images, relying on cgroups to their job.

Gilles, the containers do lock the user down, but the problem is that the file system space has to be dynamically bound to the containers across the cluster. JuptyerHub solves this problem by understanding the concept of a user, and providing a hook to change the directory to be mounted.

Daniel, we've had bad experiences with ZoL. It's allocation algorithm degrades rapidly when the file system gets over 80% full. It still is not integrated into major distros, which leads to dkms nightmares on system upgrades. I don't really see Flocker as helping in this regard, because the problem is the scheduler, not the file system. We know which directory we have to mount from the cluster file system, just need to get the scheduler to allow us to mount that with the container that is running slurmd.

Any storage with high percentage usage will degrade performance. ZoL is actually nicer than btrfs in that regard, but xfs does handle low free space better most of the time.
If you have the memory to spare, and the images are mostly identical, deduplication (or even better - compression) can help in that regard.
Regarding integration - that's mostly licensing issues, and not a reflection of the maturity of the technology itself.
Regarding dkms - use kabi-tracking-kmod
Just my 2 cents.

I'll play with Slurm Elastic Compute this week to see how it works.

Rob

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2016/06/29382.php

Re: [OMPI users] Docker Cluster Queue Manager

Reply via email to