Hi list, and Wayne in particular,

It was almost a year since we had the discussion (with
http://lists.samba.org/archive/rsync/2005-March/011875.html as it's
conclusion) regarding chances for hash collisions and large files. As
now we have someone asking about synching 5TB files, I decided to
actually submit a patch.


Attached is a patch that uses a non-predetermined hash table size, so
that the hash cell load (alpha) is never more than 80%. As far as my
understanding of rsync goes, this requires no change in the rsync protocol.


Comments welcome,


Shachar

? .match.c.swp
? dynamic_hash.patch
Index: match.c
===================================================================
RCS file: /cvsroot/rsync/match.c,v
retrieving revision 1.78
diff -u -r1.78 match.c
--- match.c	24 Feb 2006 16:43:44 -0000	1.78
+++ match.c	25 Feb 2006 11:22:12 -0000
@@ -28,7 +28,6 @@
 
 typedef unsigned short tag;
 
-#define TABLESIZE (1<<16)
 #define NULL_TAG (-1)
 
 static int false_alarms;
@@ -49,10 +48,11 @@
 
 static struct target *targets;
 
+static size_t tablesize;
 static int32 *tag_table;
 
-#define gettag2(s1,s2) (((s1) + (s2)) & 0xFFFF)
-#define gettag(sum) gettag2((sum)&0xFFFF,(sum)>>16)
+#define gettag2(s1,s2) gettag((s1) + ((s2)<<16))
+#define gettag(sum) ((sum)%tablesize)
 
 static int compare_targets(struct target *t1,struct target *t2)
 {
@@ -64,8 +64,14 @@
 {
 	int32 i;
 
-	if (!tag_table)
-		tag_table = new_array(int32, TABLESIZE);
+	/* Dynamically calculate the hash table size so that the hash load
+	 * is always about 80%.
+	 * See http://lists.samba.org/archive/rsync/2005-March/011875.html
+	 */
+	tablesize=(s->count/8)*10+11;
+	
+	free(tag_table);
+	tag_table = new_array(int32, tablesize);
 
 	targets = new_array(struct target, s->count);
 	if (!tag_table || !targets)
@@ -78,7 +84,7 @@
 
 	qsort(targets,s->count,sizeof(targets[0]),(int (*)())compare_targets);
 
-	for (i = 0; i < TABLESIZE; i++)
+	for (i = 0; i < tablesize; i++)
 		tag_table[i] = NULL_TAG;
 
 	for (i = s->count; i-- > 0; )
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to