On Thu, Jul 18, 2002 at 10:19:40AM -0700, Wayne Davison wrote: > Martin Pool <[EMAIL PROTECTED]> wrote: > > I've put a cleaned-up version of my design notes up here > > http://samba.org/~mbp/superlifter/design-notes.html > > I'll start with some feedback on your rzync comments: > > Re: rzync's name: I currently consider the rZync to be a test app to > allow me (and anyone else who wants to fiddle with it) to try out some > ideas in protocol design. Integrating the ideas from this back into > rsync or into superlifter would be ideal. If I ever decide to release > my own file transfer utility, I'll name it something useful at that > time (definitely NOT rzync). > > Re: rzync's variable-length fields: Note that my code allows more > variation than just 2 or 4 bytes -- e.g., I size the 8-byte file-size > value to only as many bytes as needed to actually store the length. I > agree that we should question whether this complexity is needed, but I > don't agree that it is wrong on principal. There are two areas where > field-sizing is used: in the directory-info compression (which is very > similar to what rsync does, but with some extra field-sizing thrown in > for good measure), and in the transmission protocol itself: > > I still have questions about how best to handle the transfer of > directory info. I'm thinking that it might be better to remove the > rsync-like downsizing of the data and to use a library like zlib to > remove the huge redundancies in the dir data during its transmission. > > In the protocol itself, there are only two variable-size elements that > goes into each message header. While this increases complexity quite a > bit over a fixed-length message header, it shouldn't be too hard to > automate a test that ensures that the various header combinations > (particularly boundary conditions) encode and decode properly. I don't > know if this level of message header complexity is actually needed (this > is one of the things that we can use the test app to check out), but if > we decide we want it, I believe we can adequately test it to ensure that > it will not be a sinkhole of latent bugs. > > Re: rzync's name cache. I've revamped it to be a very dependable design > that no longer depends on lock-step synchronization in the expiration of > old items (just in the creation of new items, which is easy to achieve). > > Some comments on your registers: > > You mention having something like 16 registers to hold names. I think > you'll find this to be inadequate, but it does depend on exactly how > much you plan to cache names outside of the registers, how much > retransmission of names you consider to be acceptable, and whether you > plan to have a "move mode" where the source file is deleted. > > My first test app had no name-cache whatsoever. It relied on external > commands to drive it, and it sent the source/destination/basis trio of > names from side to side before every step of the file's progress. While > this was simple, the increased bandwidth necessary to retransmit the > names was not acceptable to me. I think the better approach is to reduce the bandwidth needed rather than make multiple stages require side-channel communication.
> > If we just register the active items that are currently being sent over > the wire, the name will need to live through the entire sig, delta, > patch, and (optionally) source-side-delete steps. When the files are > nearly up-to-date, having only 16 of them will, I believe, be overly > restrictive. Part of the problem is that the buffered data on the > sig-generating side delays the source-side-delete messages quite a bit. > If we had a high-priority delete channel, that would help to alleviate > things, but I think you'll find that having several hundred active names > will be a better lower limit in your design thinking. > > Another question is whether names are sent fully-qualified or relative > to some directory. My protocol caches directory names in the name cache > and allows you to send filenames relative to a cached directory. Just > having a way to "chdir" each side (even if the chdir is just virtual) > and send names relative to the current directory should help a lot. I see no reason (so far) why the concept of a current tree-relative directory wouldn't be perfectly viable. The stream would contain CD commands. As such the only time we might need to pass a complete pathname would be for link destinations and a build as-you-go directory table could eliminate that. > > An additional source of cached names is in the directory scanning when > doing a recursive transfer. My protocol has specific commands that > refer to a name index within a specified directory so that the receiving > side can request changed files using a small binary value instead of a > full pathname. > > One more area of complexity that you don't mention (and I don't either > in my new-protocol doc): there are some operations where 2 names need > to be associated with one operation. This happens when we have both a > destination file and a basis file. My current cache implementation > allows both of these names to be associated with a single cache element > (though I need to improve this a bit in rzync) and lets the sig/patch > stage snag them both. If our filepath (CWD) is tree relative then we can calculate basis file and backup files using their respective tree paths. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html