RE: [SAtalk] Duplicate Emails

2004-01-12 Thread Gary Funck
As an aside, formail -D 2 /tmp/dup_id_cache.$$ -s < mbox.txt > mbox_no_dupes.txt rm -f /tmp/dup_id_cache.$$ will do a decent job of weeding out duplicates (based upon message id), where 2 is the size of the id cache. --- This SF

Re: [SAtalk] Duplicate Emails

2004-01-12 Thread Justin Mason
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Menschel writes: > I'm trying to make sure my corpus is as clean as possible, eliminating > all duplicates. > > I tried to use the masses/corpora/uniq-mailbox program for this, and had > problems which I've documented in bugzilla report 2920.

[SAtalk] Duplicate Emails

2004-01-11 Thread Robert Menschel
I'm trying to make sure my corpus is as clean as possible, eliminating all duplicates. I tried to use the masses/corpora/uniq-mailbox program for this, and had problems which I've documented in bugzilla report 2920. Fortunately, my email client identifies and can delete duplicates = same message