On Tue, Jan 17, 2012 at 02:05:10PM +0100, Samuel Thibault wrote: > Roland Mas, le Tue 17 Jan 2012 13:41:23 +0100, a écrit : > > Samuel Thibault, 2012-01-17 12:03:41 +0100 : > > > > [...] > > > > > I'm not sure to understand what you mean exactly. If you have even > > > just a hundred files of the same size, you will need ten thousand file > > > comparisons! > > > > I'm sure that can be optimised. Read all 100 files in parallel, > > comparing blocks of similar offset. You need to perform 99 comparisons > > on each block for as long as blocks are identical; > > Ah, right. So you'll start writing yet another tool? ;)
I've implemented pretty much that (http://liw.fi/dupfiles), but my duplicate file finder is not so much better than existing ones in Debian that I would inflict it on Debian. But the algorithm works nicely, and works even for people who research hash collisions. -- Freedom-based blog/wiki/web hosting: http://www.branchable.com/
signature.asc
Description: Digital signature