Martin Pool <[EMAIL PROTECTED]> wrote: > I've put a cleaned-up version of my design notes up here > http://samba.org/~mbp/superlifter/design-notes.html
I'll start with some feedback on your rzync comments: Re: rzync's name: I currently consider the rZync to be a test app to allow me (and anyone else who wants to fiddle with it) to try out some ideas in protocol design. Integrating the ideas from this back into rsync or into superlifter would be ideal. If I ever decide to release my own file transfer utility, I'll name it something useful at that time (definitely NOT rzync). Re: rzync's variable-length fields: Note that my code allows more variation than just 2 or 4 bytes -- e.g., I size the 8-byte file-size value to only as many bytes as needed to actually store the length. I agree that we should question whether this complexity is needed, but I don't agree that it is wrong on principal. There are two areas where field-sizing is used: in the directory-info compression (which is very similar to what rsync does, but with some extra field-sizing thrown in for good measure), and in the transmission protocol itself: I still have questions about how best to handle the transfer of directory info. I'm thinking that it might be better to remove the rsync-like downsizing of the data and to use a library like zlib to remove the huge redundancies in the dir data during its transmission. In the protocol itself, there are only two variable-size elements that goes into each message header. While this increases complexity quite a bit over a fixed-length message header, it shouldn't be too hard to automate a test that ensures that the various header combinations (particularly boundary conditions) encode and decode properly. I don't know if this level of message header complexity is actually needed (this is one of the things that we can use the test app to check out), but if we decide we want it, I believe we can adequately test it to ensure that it will not be a sinkhole of latent bugs. Re: rzync's name cache. I've revamped it to be a very dependable design that no longer depends on lock-step synchronization in the expiration of old items (just in the creation of new items, which is easy to achieve). Some comments on your registers: You mention having something like 16 registers to hold names. I think you'll find this to be inadequate, but it does depend on exactly how much you plan to cache names outside of the registers, how much retransmission of names you consider to be acceptable, and whether you plan to have a "move mode" where the source file is deleted. My first test app had no name-cache whatsoever. It relied on external commands to drive it, and it sent the source/destination/basis trio of names from side to side before every step of the file's progress. While this was simple, the increased bandwidth necessary to retransmit the names was not acceptable to me. If we just register the active items that are currently being sent over the wire, the name will need to live through the entire sig, delta, patch, and (optionally) source-side-delete steps. When the files are nearly up-to-date, having only 16 of them will, I believe, be overly restrictive. Part of the problem is that the buffered data on the sig-generating side delays the source-side-delete messages quite a bit. If we had a high-priority delete channel, that would help to alleviate things, but I think you'll find that having several hundred active names will be a better lower limit in your design thinking. Another question is whether names are sent fully-qualified or relative to some directory. My protocol caches directory names in the name cache and allows you to send filenames relative to a cached directory. Just having a way to "chdir" each side (even if the chdir is just virtual) and send names relative to the current directory should help a lot. An additional source of cached names is in the directory scanning when doing a recursive transfer. My protocol has specific commands that refer to a name index within a specified directory so that the receiving side can request changed files using a small binary value instead of a full pathname. One more area of complexity that you don't mention (and I don't either in my new-protocol doc): there are some operations where 2 names need to be associated with one operation. This happens when we have both a destination file and a basis file. My current cache implementation allows both of these names to be associated with a single cache element (though I need to improve this a bit in rzync) and lets the sig/patch stage snag them both. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html