On Wed, Feb 25, 2009 at 06:58:48PM +0000, Hendrik Boom wrote: > There wouldn't happen to be any handy tools for searching a directory > tree with a few hundred ASCII files and telling me which ones have > similar content? > > Many have been copied, edited, merged, reformatted, split, and I'd like > to find the differences, decide on what to keep, and delete redundant > ones. > > I know there's such a program for image files. > > I know about wdiff, which would be fine after I've paired off the similar > files (or fragments of files). to resolve differences that remain.
You could write a script that would brute force all possible pairs of files (yes, I know that's big, but it's only 125 000 for 500 files), run them through "wdiff -s", and then set some threshold for similarity on the statistics. Then, you get a list of potential matches. The only trick is setting the threshold... and I have no idea how to help you there. And if you're looking for fragments of files, that's a whole different ballgame. Cheers, -- Eric Gerlach, Network Administrator Federation of Students University of Waterloo p: (519) 888-4567 x36329 e: egerl...@feds.uwaterloo.ca -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org