I'm using a few utilities to accomplish the same thing in a second pass after rsync runs. The utils all use a two-layer hash (256 directories of 256 subdirectories), which with our current backups puts a little over 100 files per directory. Anywhere from hundreds of thousands to tens of millions of files shouldn't waste too many inodes or put a gross number of files into each directory. The code to generate the hash directory name is parameterized though, and could easily generate 1-4 layers.
static char hex_char[16] = "0123456789abcdef"; int prefixdirs = 2; ... int digestlength; digestlength = MD4_DIGEST_LENGTH; ... const char *hashdir ... hashdirlen = strlen(hashdir); hashpath = malloc(hashdirlen+digestlength*3+1); strcpy(hashpath,hashdir); hashpath[hashdirlen]='/'; for (i=0;i<prefixdirs;i++) { hashpath[hashdirlen+i*3+1] = hex_char[md[i] >> 4]; hashpath[hashdirlen+i*3+2] = hex_char[md[i] & 0xf]; hashpath[hashdirlen+i*3+3] = '/'; } for (i=prefixdirs;i<digestlength;i++) { hashpath[hashdirlen+i*2+prefixdirs+1] = hex_char[md[i] >> 4]; hashpath[hashdirlen+i*2+prefixdirs+2] = hex_char[md[i] & 0xf]; } hashpath[hashdirlen+prefixdirs+digestlength*2+1] = '\0'; ... The three utilities (hashimplode, hashdelete, hashpurge) are at http://www.cmb.usc.edu/people/dld/backuputils/. hashimplode calculates the hashes and hardlinks files. hashdelete removes a snapshot, removing orphaned hash files as it goes, and hashpurge is roughly "find /hashdir -nlinks 1 | xargs rm", in case you haven't always used hasdelete. All three use rename() to avoid races on files so they can be interleaved and run in parallel while sharing the same hash directory (which we're doing to utilize the parallel seek capacity of RAIDs and to back up multiple smaller fileservers at once). hashimplode has an option to skip files that are already hardlinked (basically the ones rsync already hardlinked to the previous backup - assumes users aren't hardlinking their own files). Using this patch would save a pass through the inodes and be a big win (30-50% faster? Almost the same as upgrading to 10K RPM disks, keeping the 250G+ size of 7200s) The utilities may prove useful migrating existing backups to this hash structure, recovering lost hash directories, and pruning the hash directory. I'd like to make these utilities use the same hash structure and race-avoidance an rsync hashdir-patch uses. -Drake <[EMAIL PROTECTED]> -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html