That is pretty freaking cool.
On Thu, Mar 12, 2009 at 11:38 AM, Eric Schrock <eric.schr...@sun.com> wrote: > Note that: > > 6501037 want user/group quotas on ZFS > > Is already committed to be fixed in build 113 (i.e. in the next month). > > - Eric > > On Thu, Mar 12, 2009 at 12:04:04PM +0900, Jorgen Lundman wrote: >> >> In the style of a discussion over a beverage, and talking about >> user-quotas on ZFS, I recently pondered a design for implementing user >> quotas on ZFS after having far too little sleep. >> >> It is probably nothing new, but I would be curious what you experts >> think of the feasibility of implementing such a system and/or whether or >> not it would even realistically work. >> >> I'm not suggesting that someone should do the work, or even that I will, >> but rather in the interest of chatting about it. >> >> Feel free to ridicule me as required! :) >> >> Thoughts: >> >> Here at work we would like to have user quotas based on uid (and >> presumably gid) to be able to fully replace the NetApps we run. Current >> ZFS are not good enough for our situation. We simply can not mount >> 500,000 file-systems on all the NFS clients. Nor do all servers we run >> support mirror-mounts. Nor do auto-mount see newly created directories >> without a full remount. >> >> Current UFS-style-user-quotas are very exact. To the byte even. We do >> not need this precision. If a user has 50MB of quota, and they are able >> to reach 51MB usage, then that is acceptable to us. Especially since >> they have to go under 50MB to be able to write new data, anyway. >> >> Instead of having complicated code in the kernel layer, slowing down the >> file-system with locking and semaphores (and perhaps avoiding learning >> indepth ZFS code?), I was wondering if a more simplistic setup could be >> designed, that would still be acceptable. I will use the word >> 'acceptable' a lot. Sorry. >> >> My thoughts are that the ZFS file-system will simply write a >> 'transaction log' on a pipe. By transaction log I mean uid, gid and >> 'byte count changed'. And by pipe I don't necessarily mean pipe(2), but >> it could be a fifo, pipe or socket. But currently I'm thinking >> '/dev/quota' style. >> >> User-land will then have a daemon, whether or not it is one daemon per >> file-system or really just one daemon does not matter. This process will >> open '/dev/quota' and empty the transaction log entries constantly. Take >> the uid,gid entries and update the byte-count in its database. How we >> store this database is up to us, but since it is in user-land it should >> have more flexibility, and is not as critical to be fast as it would >> have to be in kernel. >> >> The daemon process can also grow in number of threads as demand increases. >> >> Once a user's quota reaches the limit (note here that /the/ call to >> write() that goes over the limit will succeed, and probably a couple >> more after. This is acceptable) the process will "blacklist" the uid in >> kernel. Future calls to creat/open(CREAT)/write/(insert list of calls) >> will be denied. Naturally calls to unlink/read etc should still succeed. >> If the uid goes under the limit, the uid black-listing will be removed. >> >> If the user-land process crashes or dies, for whatever reason, the >> buffer of the pipe will grow in the kernel. If the daemon is restarted >> sufficiently quickly, all is well, it merely needs to catch up. If the >> pipe does ever get full and items have to be discarded, a full-scan will >> be required of the file-system. Since even with UFS quotas we need to >> occasionally run 'quotacheck', it would seem this too, is acceptable (if >> undesirable). >> >> If you have no daemon process running at all, you have no quotas at all. >> But the same can be said about quite a few daemons. The administrators >> need to adjust their usage. >> >> I can see a complication with doing a rescan. How could this be done >> efficiently? I don't know if there is a neat way to make this happen >> internally to ZFS, but from a user-land only point of view, perhaps a >> snapshot could be created (synchronised with the /dev/quota pipe >> reading?) and start a scan on the snapshot, while still processing >> kernel log. Once the scan is complete, merge the two sets. >> >> Advantages are that only small hooks are required in ZFS. The byte >> updates, and the blacklist with checks for being blacklisted. >> >> Disadvantages are that it is loss of precision, and possibly slower >> rescans? Sanity? >> >> But I do not really know the internals of ZFS, so I might be completely >> wrong, and everyone is laughing already. >> >> Discuss? >> >> Lund >> >> -- >> Jorgen Lundman | <lund...@lundman.net> >> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) >> Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) >> Japan | +81 (0)3 -3375-1767 (home) >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss