Does anyone have any good techniques for capturing a sample of ham that can be used as the ham corpus. I'm in a corporate environment and am not keen on the idea of intercepting non-spam messages. I will if I have to, but was hoping someone had a better idea.
Depending on your MTA/MDA, you might be able to do it on the fly so that an actual copy of the message isn't necessary. For instance, if the messages pass through procmail, learn them just before delivery if the X-Spam-Status header isn't set to yes. Oh, and make sure you pass the --no-sync flag to sa-learn, then schedule the syncing for sometime during off-peak hours.