Hi list, and Wayne in particular,
It was almost a year since we had the discussion (with http://lists.samba.org/archive/rsync/2005-March/011875.html as it's conclusion) regarding chances for hash collisions and large files. As now we have someone asking about synching 5TB files, I decided to actually submit a patch. Attached is a patch that uses a non-predetermined hash table size, so that the hash cell load (alpha) is never more than 80%. As far as my understanding of rsync goes, this requires no change in the rsync protocol. Comments welcome, Shachar
? .match.c.swp ? dynamic_hash.patch Index: match.c =================================================================== RCS file: /cvsroot/rsync/match.c,v retrieving revision 1.78 diff -u -r1.78 match.c --- match.c 24 Feb 2006 16:43:44 -0000 1.78 +++ match.c 25 Feb 2006 11:22:12 -0000 @@ -28,7 +28,6 @@ typedef unsigned short tag; -#define TABLESIZE (1<<16) #define NULL_TAG (-1) static int false_alarms; @@ -49,10 +48,11 @@ static struct target *targets; +static size_t tablesize; static int32 *tag_table; -#define gettag2(s1,s2) (((s1) + (s2)) & 0xFFFF) -#define gettag(sum) gettag2((sum)&0xFFFF,(sum)>>16) +#define gettag2(s1,s2) gettag((s1) + ((s2)<<16)) +#define gettag(sum) ((sum)%tablesize) static int compare_targets(struct target *t1,struct target *t2) { @@ -64,8 +64,14 @@ { int32 i; - if (!tag_table) - tag_table = new_array(int32, TABLESIZE); + /* Dynamically calculate the hash table size so that the hash load + * is always about 80%. + * See http://lists.samba.org/archive/rsync/2005-March/011875.html + */ + tablesize=(s->count/8)*10+11; + + free(tag_table); + tag_table = new_array(int32, tablesize); targets = new_array(struct target, s->count); if (!tag_table || !targets) @@ -78,7 +84,7 @@ qsort(targets,s->count,sizeof(targets[0]),(int (*)())compare_targets); - for (i = 0; i < TABLESIZE; i++) + for (i = 0; i < tablesize; i++) tag_table[i] = NULL_TAG; for (i = s->count; i-- > 0; )
-- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html