RE: [SAtalk] Duplicate Emails

2004-01-12 Thread Gary Funck
As an aside, formail -D 2 /tmp/dup_id_cache.$$ -s < mbox.txt > mbox_no_dupes.txt rm -f /tmp/dup_id_cache.$$ will do a decent job of weeding out duplicates (based upon message id), where 2 is the size of the id cache. --- This SF

Re: [SAtalk] Duplicate Emails

2004-01-12 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Menschel writes: > I'm trying to make sure my corpus is as clean as possible, eliminating > all duplicates. > > I tried to use the masses/corpora/uniq-mailbox program for this, and had > problems which I've documented in bugzilla report 2920.