On Mon, 29 Nov 2004, [EMAIL PROTECTED] moaned: > Unless the address has never been used by a real person, you should > manually check each message to see whether it's spam. Personally, I > never have the endurance to check more than about 500 messages at a > shot. So I'd just cut it into files of a size I could manually verify > without bleeding from the eyes, delete any hammy-looking stuff I find in > each file as I go through it, and then save the verified files and use > those for bayes training.
I've always validated things like that by mass-checking the mailbox and manually checking the stuff close to the spam/ham boundary line, on the basis that SA is pretty much *never* wrong for very-high-scoring things being spam --- well, maybe it is for particularly atrocious newsletters or something, but my users don't get any such abominations. It still means a good few manual checks, but checking a hundred-odd mails is a hell of a lot easier than checking tens of thousands. -- `The sword we forged has turned upon us Only now, at the end of all things do we see The lamp-bearer dies; only the lamp burns on.'