Re: Bug#656142: ITP: duff -- Duplicate file finder

Samuel Thibault Tue, 17 Jan 2012 03:04:06 -0800

Lars Wirzenius, le Tue 17 Jan 2012 10:45:20 +0000, a écrit :
> > > Personally, I would be wary of using checksums for file comparisons,
> > > since comparing files byte-by-byte isn't slow (you only need to
> > > do it to files that are identical in size, and you need to read
> > > all the files anyway).
> > 
> > In some cases you may have a lot of files with identical size, so at
> > least a simple SSE-prone thing like crc is useful.
> 
> That's a good point. However, the pathological case would need to
> be quite pathological, since you can check around a thousand files
> of the same time at the same time (i.e., the number of open files
> per process), which is fairly rare for most people. But not all
> people, of course.


I'm not sure to understand what you mean exactly. If you have even
just a hundred files of the same size, you will need ten thousand file
comparisons! Using a hash reduces that to indexing the hundred file
hashes.

Samuel


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120117110341.gl4...@type.bordeaux.inria.fr

Re: Bug#656142: ITP: duff -- Duplicate file finder

Reply via email to