On Saturday, December 12, 2009, Andy Hayward <a...@buteo.org> wrote:
> On Fri, Dec 11, 2009 at 23:24, STeve Andre' <and...@msu.edu> wrote:
>> B  I am wondering if there is a port or otherwise available
>> code which is good at comparing large numbers of files in
>> an arbitrary number of directories? B I always try avoid
>> wheel re-creation when possible. B I'm trying to help some-
>> one with large piles of data, most of which is identical
>> across N directories. B Most. B Its the 'across dirs' part
>> that involves the effort, hence my avoidance of thinking
>> on it if I can help it. ;-)
>
> sysutils/fdupes
>
> -- ach
>

If you have a database available yo can store file hashes and use SQL.
I used postgres for the job and had reasonable performance on a 10
million file collection. I stored directory paths in one table and
filename, size, and sha1 in another table. Scripting the table
creation was fairly easy...

-N

Reply via email to