My goal is to create a tool that does backup and restore only transferring changes. It will connect to a server running Linux from Mac OS X and preserve all metadata without the user ever knowing there is an issue. I've found the rsync algorithm is a good start and it sounds like you all have the same idea.
I don't think I like the idea of the MacBinary solution, in that I can see some configuration of the tool that the user will have to worry about. We obviously don't want the overhead of flattening files without forks or files that have FileInfo that can be determined from other metadata strategies. The user might have to maintain a list of files they use... How do I handle this file or that (á la mac cvs tools).
I see another user experience issue with the MacBinary solution and the protocol change. What do the files look like when they get backed up? If I connect to the server via the finder am I going to see a bunch of files that are 'archived' or do I get the real deal. I would hate to use rsync if I couldn't just go and grab the files that got backed up. Not that running the file through stuffit is a big deal but it going to seems like a bit of a kludge to the user even if the solution is in fact much more elegant. What format is this new protocol going to produce? Will the only way to get to the files be to use the rsync client? Sorry, that's just not acceptable.
The only solution left is to pre-process the file by splitting it before before creating the change lists. There will have to be some intelligence about what method of splitting was used on the server but I'm positive that couldn't be too hard to determine. Please tell me if I'm way off base here.
One other question that I'm sure will show my ignorance of Darwin development. What is the issue with using the high level API's if the output is compatible with the other platforms running rsync. What is the advantage of trying for posix purity or code at the "Darwin level" if the code is only going to be used on Macs running the higher level stuff anyway? If you don't have a forked file system why would you care if you don't know how to handle forks?
I'm planning on taking this project on full time and we would all benefit if we can all agree on a direction.
Lets get this thing going,
Terrence Geernaert
Mark Valence wrote:
So, that's one vote each for options 1, 2, and 3 ;-)
I agree that the ideal implementation would support HFS+ as well as netatalk's .AppleDouble scheme, Mac OS X's ._<filename> scheme, and MacBinary for all the rest. This can certainly be a goal of the implementation, but personally I am interested in the HFS+ on Mac OS X part of the problem.
My implementation, whether it is MacBinary based or a change the the protocol, will leave room for these alternative schemes. Right now, I am thinking that MacBinary is the way to go. This doesn't give the flexibility and extensibility that a protocol change would give, but it does have the benefit of supporting existing rsync versions.
Chris I., I'm not sure what you mean by "done at the Darwin level". If you mean that it should be done based on Darwin/BSD APIs and not Carbon/Cocoa APIs, then I am in full agreement with you. The calls that I'd use to access the resource fork are posix calls (essentially, it's just an open() call), although the calls to get HFS metadata are Mac OS X-specific (but not Carbon calls).
Anyway, I'm still mulling all this over, so any suggestions are more than welcome. Once a path is chosen and code is written, things will be harder to change ;-)
Chris Garrigues wrote:
A quick thought about implementation details: It would be nice if this were
done in such a way that if I were to rsync from a non-OSX netatalk system
onto an OSX system the .AppleDouble directories would be merged back into the
files, and conversely if I were to rsync from an OSX system to a netatalk
system the resource forks would be split into .AppleDouble directories.
I guess this would be simplest with scheme 2 above.
David Feldman wrote:
I'm not familiar with netatalk, but along a similar line, Mac OS X stores resource forks and metadata differently on HFS+ and single-fork volumes (such as UFS or NFS). If you copy a file from an HFS+ volume over to a single-fork volume using the Finder it'll split the pieces apart and save the resource fork and metadata under variations of the original filename. I don't remember the exact names but I think they're in the Mac OS X System Overview document...something like ._<original filename>.
If there's a way I can help with the porting effort please let me know. I don't know a lot about the lower-level details, but do know C, C++, Cocoa, etc. and would be interested in looking at the BSD-level info you have on transferring OS X files.
As I stated in my earlier message, my primary interest is synchronization of desktop and laptop, though backup would be terrific too. I'm pretty sure there are a lot of OS X users out there in need of both. I'm currently synchronizing with a shell script that uses ditto.
Chris Irvine wrote:
I would lean toward option "1" for several reasons. Primarily it could probably inter-operate safely with non-HFS or older versions.
How about a flag that changes the mode to detect named forks and encode them in-line. These encoded files could be safely synced to non-forked storage destinations or tape. A simple tag passed at the beginning of a session could notify the destination that MacBinary decoding could be attempted if available.
I also understand the need for named resource files for systems like netatalk. The problem with this is that every named fork system is different: netatalk, Xinet, Helios, OSX Finder. This is a lot to chew. I would rather the user post process files to get them into the named fork method if they must. If you are going between two systems using the named fork technique, this whole process is unneeded.
Option "3" might be the best. It seems to me that this could end up requiring a lot of changes to the protocol.
It should also be noted, that a project like this should be done at the Darwin level. There have also been discussions on the darwin-development list in June 01. No one really stared anything, however they did discuss at length how access to resource forks might be done while stying inside posix calls.