Wayne Davison wrote: >Thanks for the patch! Here's some comments: > > - You didn't change the size of the "tag" typedef (an unsigned short), > and your patch makes the value potentially overflow. > > Gotcha. I'm sending an amended patch.
> - For smaller hash-table sizes, your algorithm does a lookup in the > table based only on the s1 value (due to the (s2 << 16) value being > too large to have any remainder less than the tablesize). > >So, I think this probably needs to leave gettag() calling gettag2(), and >change gettag2() to factor both s1 and s2 into some kind of an improved >tag-generating computation. > > I disagree. Let's begin with an example. Suppose that we only have 7 hashes (s->count=7). 7/8=0. 0*10=0. 0+11=11. Our hash table size is 11, which is the absolute minimum it will ever get. Now let's suppose all 7 hashes have "1234" as their lower hash value, and have the number 1000 through 1006 as their high value. They will be filed to: 1000:1234 -> 4 1001:1234 -> 2 1002:1234 -> 0 1003:1234 -> 9 1004:1234 -> 7 1005:1234 -> 5 1006:1234 -> 3 Obviously, the higher checksum DID get a chance to affect the cell we land in. For the more general case, our function (s1+s2*65536)%ts (where ts is the table size). Modern algebra dictates that this is the same as saying (s1%ts + (s2%ts) * (65536%ts))%ts. In other words, you can first mod each element individually, and only then do the actual addition and subtraction. It's easy to see that s2 will not get nullified ever, unless 65536%ts is zero. As 65536 is 2^16, and as ts is guarenteed to be odd, this is impossible. Venturing deeper into modern algebra, we know it is theoretically possible that s2 will have some affect on the hash cell chosen, but will not be able to choose any cell at all. This can be seen in the case of (s1+s2*15)%9. If s1 is, say, 3, the different s2 values can select cells 3, 6 and 0. This will happen if and only if the factor (15) and the modulo (9) have a greatest common divisor (gcd - open office calc actually has a function of that name) which is larger than 1 (3, in this case). In jargon, we will say that two number that have a gcd of 1 are coprime. Since ts is always odd (we multiply a number by 10 and then add 11), it will always be coprime to 65536 (which is only divided by even numbers). This means that s2 has as much a chance to select the hash cell we end up in as s1. I don't think it is necessary to change that aspect of the code. I did change the comment in the patch to summarize this point. >..wayne.. > > Shachar
? dynamic_hash.patch Index: match.c =================================================================== RCS file: /cvsroot/rsync/match.c,v retrieving revision 1.78 diff -u -r1.78 match.c --- match.c 24 Feb 2006 16:43:44 -0000 1.78 +++ match.c 25 Feb 2006 18:42:05 -0000 @@ -26,9 +26,8 @@ int updating_basis_file; -typedef unsigned short tag; +typedef unsigned int32 tag; -#define TABLESIZE (1<<16) #define NULL_TAG (-1) static int false_alarms; @@ -49,10 +48,11 @@ static struct target *targets; +static size_t tablesize; static int32 *tag_table; -#define gettag2(s1,s2) (((s1) + (s2)) & 0xFFFF) -#define gettag(sum) gettag2((sum)&0xFFFF,(sum)>>16) +#define gettag2(s1,s2) gettag((s1) + ((s2)<<16)) +#define gettag(sum) ((sum)%tablesize) static int compare_targets(struct target *t1,struct target *t2) { @@ -64,8 +64,14 @@ { int32 i; - if (!tag_table) - tag_table = new_array(int32, TABLESIZE); + /* Dynamically calculate the hash table size so that the hash load + * is always about 80%. + * This number must be odd or s2 will not be able to span the entire set + */ + tablesize=(s->count/8)*10+11; + + free(tag_table); + tag_table = new_array(int32, tablesize); targets = new_array(struct target, s->count); if (!tag_table || !targets) @@ -78,7 +84,7 @@ qsort(targets,s->count,sizeof(targets[0]),(int (*)())compare_targets); - for (i = 0; i < TABLESIZE; i++) + for (i = 0; i < tablesize; i++) tag_table[i] = NULL_TAG; for (i = s->count; i-- > 0; )
-- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html