On Saturday, December 12, 2009, Andy Hayward <a...@buteo.org> wrote: > On Fri, Dec 11, 2009 at 23:24, STeve Andre' <and...@msu.edu> wrote: >> B I am wondering if there is a port or otherwise available >> code which is good at comparing large numbers of files in >> an arbitrary number of directories? B I always try avoid >> wheel re-creation when possible. B I'm trying to help some- >> one with large piles of data, most of which is identical >> across N directories. B Most. B Its the 'across dirs' part >> that involves the effort, hence my avoidance of thinking >> on it if I can help it. ;-) > > sysutils/fdupes > > -- ach >
If you have a database available yo can store file hashes and use SQL. I used postgres for the job and had reasonable performance on a 10 million file collection. I stored directory paths in one table and filename, size, and sha1 in another table. Scripting the table creation was fairly easy... -N