On Mon, Jan 05, 2004 at 04:18:30PM -0800, jw schultz wrote: > It puts us back into the business of doing a link lookup for every > regular file.
Ah yes, I missed that. Adding a flag would be sufficient to avoid this. However, your more radical changes look even better. I particularly like the idea of moving the inode/dev info out of the normal file_struct. (Aside: it might also be good to avoid doing a bunch of small allocs for the extra memory you suggest, perhaps by allocating a pool of structures like file_struct does, but we can worry more about that later.) One other thought occurred to me that I haven't tried to analyze in depth yet: what if we had a temporary hash (b-tree?) that we built from the dev+inode combinations as the names arrived. The data would be an offset into the file-list array. If we get a hit, we flag ourselves as an hlink-dup and link ourselves into the top of the chain pointed to by the first flist_struct (which would have a value that contains an index of the next hard-link item in our group). At the end of this, the hash goes away. Our transfer code would work much like what you proposed, with one additional optimization that I'd suggest: when we finish off the first hard-link item in a group, link all its hard-link brethren at that point, not as a separate loop at the end of the transfer. This "hash" idea could actually work with the current sort method too. We'd just sort a data structure that had the dev+inode+flist_index items in it and translate that into the values in the file_list (before discarding the extra data). One final memory-saving hack: turn the rdev item into a union with the new hard-link-list-index item, and only use it as an index if a flag were set indicating that we should (since we can't be both hard-linked and a device). ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html