That is pretty freaking cool.

On Thu, Mar 12, 2009 at 11:38 AM, Eric Schrock <eric.schr...@sun.com> wrote:
> Note that:
>
> 6501037 want user/group quotas on ZFS
>
> Is already committed to be fixed in build 113 (i.e. in the next month).
>
> - Eric
>
> On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote:
>>
>> In the style of a discussion over a beverage, and talking about
>> user-quotas on ZFS, I recently pondered a design for implementing user
>> quotas on ZFS after having far too little sleep.
>>
>> It is probably nothing new, but I would be curious what you experts
>> think of the feasibility of implementing such a system and/or whether or
>> not it would even realistically work.
>>
>> I'm not suggesting that someone should do the work, or even that I will,
>> but rather in the interest of chatting about it.
>>
>> Feel free to ridicule me as required! :)
>>
>> Thoughts:
>>
>> Here at work we would like to have user quotas based on uid (and
>> presumably gid) to be able to fully replace the NetApps we run. Current
>> ZFS are not good enough for our situation. We simply can not mount
>> 500,000 file-systems on all the NFS clients. Nor do all servers we run
>> support mirror-mounts. Nor do auto-mount see newly created directories
>> without a full remount.
>>
>> Current UFS-style-user-quotas are very exact. To the byte even. We do
>> not need this precision. If a user has 50MB of quota, and they are able
>> to reach 51MB usage, then that is acceptable to us. Especially since
>> they have to go under 50MB to be able to write new data, anyway.
>>
>> Instead of having complicated code in the kernel layer, slowing down the
>> file-system with locking and semaphores (and perhaps avoiding learning
>> indepth ZFS code?), I was wondering if a more simplistic setup could be
>> designed, that would still be acceptable. I will use the word
>> 'acceptable' a lot. Sorry.
>>
>> My thoughts are that the ZFS file-system will simply write a
>> 'transaction log' on a pipe. By transaction log I mean uid, gid and
>> 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but
>> it could be a fifo, pipe or socket. But currently I'm thinking
>> '/dev/quota' style.
>>
>> User-land will then have a daemon, whether or not it is one daemon per
>> file-system or really just one daemon does not matter. This process will
>> open '/dev/quota' and empty the transaction log entries constantly. Take
>> the uid,gid entries and update the byte-count in its database. How we
>> store this database is up to us, but since it is in user-land it should
>> have more flexibility, and is not as critical to be fast as it would
>> have to be in kernel.
>>
>> The daemon process can also grow in number of threads as demand increases.
>>
>> Once a user's quota reaches the limit (note here that /the/ call to
>> write() that goes over the limit will succeed, and probably a couple
>> more after. This is acceptable) the process will "blacklist" the uid in
>> kernel. Future calls to creat/open(CREAT)/write/(insert list of calls)
>> will be denied. Naturally calls to unlink/read etc should still succeed.
>> If the uid goes under the limit, the uid black-listing will be removed.
>>
>> If the user-land process crashes or dies, for whatever reason, the
>> buffer of the pipe will grow in the kernel. If the daemon is restarted
>> sufficiently quickly, all is well, it merely needs to catch up. If the
>> pipe does ever get full and items have to be discarded, a full-scan will
>> be required of the file-system. Since even with UFS quotas we need to
>> occasionally run 'quotacheck', it would seem this too, is acceptable (if
>> undesirable).
>>
>> If you have no daemon process running at all, you have no quotas at all.
>> But the same can be said about quite a few daemons. The administrators
>> need to adjust their usage.
>>
>> I can see a complication with doing a rescan. How could this be done
>> efficiently? I don't know if there is a neat way to make this happen
>> internally to ZFS, but from a user-land only point of view, perhaps a
>> snapshot could be created (synchronised with the /dev/quota pipe
>> reading?) and start a scan on the snapshot, while still processing
>> kernel log. Once the scan is complete, merge the two sets.
>>
>> Advantages are that only small hooks are required in ZFS. The byte
>> updates, and the blacklist with checks for being blacklisted.
>>
>> Disadvantages are that it is loss of precision, and possibly slower
>> rescans? Sanity?
>>
>> But I do not really know the internals of ZFS, so I might be completely
>> wrong, and everyone is laughing already.
>>
>> Discuss?
>>
>> Lund
>>
>> --
>> Jorgen Lundman       | <lund...@lundman.net>
>> Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
>> Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
>> Japan                | +81 (0)3 -3375-1767          (home)
>> _______________________________________________
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
> --
> Eric Schrock, Fishworks                        http://blogs.sun.com/eschrock
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to