On Mon, Jan 05, 2004 at 05:11:04PM -0800, Wayne Davison wrote: > On Mon, Jan 05, 2004 at 04:18:30PM -0800, jw schultz wrote: > > It puts us back into the business of doing a link lookup for every > > regular file. > > Ah yes, I missed that. Adding a flag would be sufficient to avoid this. > However, your more radical changes look even better. > > I particularly like the idea of moving the inode/dev info out of the > normal file_struct. (Aside: it might also be good to avoid doing a > bunch of small allocs for the extra memory you suggest, perhaps by > allocating a pool of structures like file_struct does, but we can worry > more about that later.)
Use allocation pools and we do save some by freeing them. > One other thought occurred to me that I haven't tried to analyze in > depth yet: what if we had a temporary hash (b-tree?) that we built from > the dev+inode combinations as the names arrived. The data would be an > offset into the file-list array. If we get a hit, we flag ourselves as > an hlink-dup and link ourselves into the top of the chain pointed to by > the first flist_struct (which would have a value that contains an index > of the next hard-link item in our group). At the end of this, the hash > goes away. Our transfer code would work much like what you proposed, > with one additional optimization that I'd suggest: when we finish off > the first hard-link item in a group, link all its hard-link brethren at > that point, not as a separate loop at the end of the transfer. > > This "hash" idea could actually work with the current sort method too. > We'd just sort a data structure that had the dev+inode+flist_index items > in it and translate that into the values in the file_list (before > discarding the extra data). If we continued to do the link searches i'd support changing the sorted list into a hash of lists. Now i'm not so sure. > One final memory-saving hack: turn the rdev item into a union with the > new hard-link-list-index item, and only use it as an index if a flag > were set indicating that we should (since we can't be both hard-linked > and a device). Except that we could hard-link devices and symlinks. The only file type that ought not be linked (manually) are directories. The fact that we don't preserve device and symlink hardlinks is, in my opinion, a flaw. -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html