On Sat, Feb 25, 2006 at 08:42:37PM +0200, Shachar Shemesh wrote: > I disagree.
I think you may have meant to say was, "Utterly, totally, and in all meaningful ways false." :-) In my hasty read-through of your patch I was obviously only thinking about powers of 2, so my second critique completely missed the mark. In looking at your revised patch, I was thinking that we could change the way the hash data is layed out and do away with the tag_table array completely. The following patch reorganizes the data to have a hash array that just saves off the indexes into the sum array, and uses a new "chain" value in the sum_buf structure itself to link collisions together. This not only saves memory (even compared to the original), but does away with the extra comparing of tag values (which will always be the same when we're in a particular chain of collisions). http://rsync.samba.org/ftp/unpacked/rsync/patches/dynamic_hash.diff One thing this patch does is to (1) leave the array allocated to its largest size, (2) use realloc() if we need to make it bigger, (3) make the minimum hash-table size 65537 (a prime). Some of these decisions are debatable: The 1st item makes us more efficient in our malloc calls when sending large files, but could waste sender-side memory when transferring a single large file in the middle of a bunch of normal-sized files. The 3rd item might be a bit over the top, but we used to always allocate a tag array of 65536 elements, and since I noticed some hash collisions occurred in small files using a hash-table size of 11 items, I figured it would be an acceptable overhead to make normal-sized files much more likely to have no collisions at all. Comments welcomed. Thanks again for your patch! ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html