Access Control Lists (ACLs) and Extended Attributes (EA) are an area i have seen for some time as something rsync will need to address. I've put a tighter focus on this issue for the past week or so and have reached a few conclusions.
1. ACL and EA OS support is growing but not really there yet. Most of the UNIX players have POSIX ACL support. I have no data on UNIX EA support but as far as i can tell it is mostly absent. Linux does not have consistent support yet. ACL support is not part of mainstream production kernels. If you want it you have to either apply patches or run a development kernel. Some distribution kernels are patched with ACL support for one or more filesystems. I know that XFS supports ACLs and EAs, later versions of EXT2, EXT3 and JFS do also. I'm not entirely sure of the status of rieserfs. 2. Utility support is almost completely missing. Not only does rsync not support ACLs and EAs yet but neither does cpio, tar (with the exception of star) nor most of the backup utilities. 3. Use lags support dramatically. The vast majority of sites do not use either ACLs nor EAs even when they can. The lack of utility support aggravates this. Until support is ubiquitous across production-grade OSs, filesystems and utilities the adoption of ACLs and EAs will be delayed. 4. ACLs and EAs are a part of the future. Users and admins are coming to linux and Unix with the expectation of ACLs. While intelligent use of group IDs can more simply deal with _almost_ all permissions issues, and by being simpler tend to be more secure, many will prefer the quick fix ACLs provide. Some of the new security models i've seen are going to require both ACLs and EAs. The potential value of EAs in GUI environments should not be underestimated. Imagine file-manager thumbnails and application icons that really are attached to the file. Even rsync may find good use for EAs. I can envision optional storing of blocksums as an extended attribute. A 64KB EA could support the blocksums for a 127MB file using 4byte sums and 16KB blocks. Who knows what else the future might hold. Once people expect EA support. Those applicable utilities that fail to support ACLs and EAs will become irrelevant. So while demand is currently low i believe that rsync will need to support ACLs and EAs in the near future or it will become little more than a limited download tool like ftp. When that will be is an unknown but i think the release of the Linux 2.6 kernel will be a major factor. This means that the widespread use of ACLs on the filesystems may well begin in as little as one year if 2.6 comes out on schedule. Where does that leave us? What am i going to do about it? Based on the level of comments on this and other threads there is little demand presently for ACL or EA support in the rsync user or developer base. There are a few people for whom this is a production issue today and a few like myself who see it as a future issue that should be anticipated. I do not use ACLs or EAs currently. I believe that support for them should be added sooner rather than later and i think that how that support is implemented is very important. I however do not use them at this time and unless someone will pay me to do so i'm not going to set up a lab or start slinging code to add that support. Sorry but this just isn't my itch yet. While i care how it is done i don't care enough yet to do it myself and build the test apparatus. The remainder of this missive shall be a bit more technical. Since i do care how they are implemented and have some informed ideas in that direction it seems good to me to relate those ideas. If you like design documents think of this as a start on a high-level one. I don't generally care for design documents but the design should be discussed before such significant code is generated. If you aren't interested in such discussion, or don't care what i have to say please move on instead of complaining. I've broken this down into several sections. If you have comments to make please address one issue per followup so they form separate sub-threads and only quote the relevant text. Unless there are comments i expect this will be the last i'll discuss this issue for a while. -- How have OSs implemented Access Control Lists -- In UNIX ACLs are implemented in various, sometimes non-standard, ways. For the most part it looks like they are largely compatible with POSIX ACLs. In Linux ACLs are implemented as an EA. In order to support ACLs on Linux and non-linux platforms we have to treat them separately from the EAs. Although NTFS supports ACLs the cygwin environment does not _yet_ reflect that.*** NTFS ACLs do not quite translate to and from POSIX. The semantic differences mean that information is lost translating each direction. Netware also has a form of ACLs similar to NTFS. The order of POSIX ACLs does not affect them. The only consistent compact expression of them is textual in the form: [d:]type:id:perms type is one of u,g,o,m id is either the name or ascii id number. perms is a the symbolic rwx string as shown in ls As near as i can tell the acl_t structure used by the libraries is an opaque data type and working on it directly is likely to break on some platforms and may be subject to change in future. -- How have OSs implemented Extended Attributes -- EAs seem to be unsupported by most of the UNIX platforms. Pipe up now if you know of support for them on any mainstream UNIX. Limited EAs are supported on the MacOS and OS-X in the form of the resource fork. I have a vague recollection that NTFS also has some form of EAs. In Linux EAs have been growing in importance. In addition to ACLs other security features are being implemented with them. Extended attributes are simple name, value pairs. Names are partitioned into namespaces ("user." and "system." at present) are null terminated fully qualified text up to 256 bytes long. Values are BLOBS up to 64KB long. The count and total size of a file's EAs is implementation dependant (varies from 1KB to unlimited). The order on retrieval of them is indeterminate. -- What should rsync support -- Extended Attributes and POSIX ACLs. Like it or not rsync is POSIX semantics oriented. Further, when running on the one major non-POSIX platform it does so in a POSIX emulating environment (cygwin) so it doesn't even have access to non-POSIX file semantics. Full cross-platform functionality isn't really rsync's bailiwick. Perhaps a future tool that supercedes rsync might interoperate with native, non-POSIX semantics but i don't see rsync going there. If all we supported were Linux we could simply provide Extended attributes and the ACLs would be automatically included. However, we should support the other POSIX (Unix mostly) platforms. It would be worthwhile to have some limited support for non-POSIX ACLs in some cases. I'll discuss that in the "how" section. -- How should rsync support Access Control Lists -- As i've already said the focus should be on POSIX ACLs. As much as i am inclined to deal with ACLs as binary data short type/flags short perms int id or something similar, the library routines treat the acl_t structure as an opaque object and don't really support that. So we would wind up converting the acl_t to the text form and then to the binary and visa-versa. I don't think the compaction (10-15 chars/ACE -> 8 or 12 bytes/ACE) is worth the extra computation. If the acl_t structure turns out to be less opaque i'd be more than happy to revisit the binary format. Given the potential size of an ACL we don't want to transmit the ACL for every file that has an extended one. An extended ACL is one with entries not in the standard POSIX UGO permissions mask. What we should do instead is assemble each ACL into a consistent, sorted form that lends itself to checksum comparison. A single block checksum for a file's ACL would allow us to identify those files who's ACLs had changed. This way we would only transmit an ACL if it had changed. Limited support for non-POSIX ACLs could be provided. During the protocol discovery phase** ACL capabilities should be determined. If two like systems (windows<->windows for example) are communicating they could use their native ACL format instead of POSIX. Otherwise non-POSIX systems would be expected to convert their ACL format to and from our chosen POSIX representation. -- How should rsync support Extended Attributes -- Extended attributes can be quite large. The EAs of a file should be built into a single contiguous object. The object would be built with the extended attributes sorted by name. This EA object would be compared and transmitted just like a regular file using the rsync algorithm. Possible alternatives would be to generate a separate checksums for each EA or to somehow start a new blocksum with each EA to take advantage of the fact that each EA is a discrete entity requiring a separate syscall to retrieve, remove, set or change. Where ACLs are being supported by EAs the ACL EA(s) would not be included in the EA synchronization. For this reason it will be necessary to write the ACLs after the EAs. So the order of ops on a file update would be file-data, EA, ACL, chown, chgrp, chmod, and finally mtime. -- Unresolved issues -- A drawback to checksum approach to EA and ACL objects is that it would often be necessary to create the objects twice on the receiver, once to generate the blocksums and a second time to merge any changes. Also at issue is whether and what timestamps are modified by changes to ACLs and EAs. Such updates should change ctime but not necessarily mtime. If that is the case it may be necessary to generate total checksum and size on the sender during the file-scan phase or things could get quite ugly. Such a early sender checksum would mean the objects might have to be generated twice on the sender as well. The regeneration may be partially avoidable by allowing a limited amount to be attached to the flist or otherwise cached. The ACLs in particular may be small enough to keep. Because ACLs and EA lists will often be the same on many files a hash to the ACLs and EAs objects could be maintained and duplicates could be identified reducing inter-phase storage requirements. -------- ** Some of my comments may conflict with the rsync protocol. This is because i have not so far needed to get into the protocol itself and i have yet to see any decent documentation on the protocol. Tridge's algorithm is well enough documented but the protocol implementing it is not. *** I don't use windows so my cygwin and NTFS knowledge is limited. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html