Justin Pryzby wrote: >On Mon, Feb 27, 2006 at 04:47:31PM -0500, pryzbyj wrote: > >>On Mon, Feb 27, 2006 at 05:04:30PM +0000, P??draig Brady wrote: >> >>>Hi, >>> >>>I've been maintaining FSlint for a few years now >>>and it has proved quite popular. There have even >>>been (buggy) thirdparty debian packages floating around. >>>In the latest version (2.14) I have created a debian package, >>>and it would be create if someone could sponsor this >>>package for inclusion in debian. > >This package is really quite neat. I've read through much of the >code, (lots of pretty-small bashscripts), and I must say that I'm >inspired. I especially like this "find duplicates" pipeline (my own >implementation here):
Cheers. Hopefully we'll get 2.15 into debian soon. I'm working on your comments and also I have a bug fix I'd like to get done. > > find . -type f -print0 |xargs -r0 md5sum |sort |sed -re 's/(\S*)\s*(\S*)/\2\t\1/' |uniq -df1 --all-=sep |sed -e 's/\t\S*$//;' Note throwing away unique file sizes first is a huge optimization. I also sort by inodes (or path is nearly as good), which reduces disk seeking a lot. > >Does anyone know a prettier way of switching the md5sum output than >this sedscript?? (Has to deal with special pathnames, of course!) My method is more robust BTW (try path names with spaces) sed -e 's/\(^.\{32\}\)..\(.*\)/\2 \1/' Note I think uniq will get key support (like sort) at some stage. Also debian has a specific patch for -W to compare only the first N fields. However this is not standard and has just been removed I understand. > >Or a way of optimizing the files removed? (Probably to maximize the >level of directories which have no normal files anywhere within them >after removal). Never thought of that. Hmm... Pádraig. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]