On Tue, 2017-08-15 at 07:55 -0700, Scott wrote: > I need a way to go from Outlook to train SA if I'm to train at all. > FOr > most of my users the inbound mail is handed off to a 3rd party > Exchange > server that I don't have access to. So setting up a public IMAP > folder on > the exchange server type solution is probably not possible. And I > presume > that process messes with the messages too anyway. I can't cc the > users mail > on my server for later review, there would be too many. > > If I'm forwarded spam as an attachment for learning, I would require > ham > from the same method. > > My plan wasn't to make this a daily routine. Only to help a few users > who > say they are getting too much spam slipping through all the other > checks > untagged. To help train bayes to assist on those problem users. Old > email > accounts that can't be changed and are on the golden spam lists. > > The reason to "reassemble" the extracted attachments was just to make > it > easier for me to access the messages and review them. Too tedious at > the > console. Don't know how to use formal to do it, and wont' it add some > more > headers to the mess too? > > FWIW, I did try sa-learn on a sample of extracted attachments in their > raw > form. It was happy with them: > [root@tn3 msg-1502747659-31280-0]# sa-learn --spam * > Learned tokens from 97 message(s) (97 message(s) examined) > > But picking through them to vet them would be too tedious at the > console. > They get random number type filenames as part of the extraction. > > My constraints are: > - messages are sent to 3rd party exchange server > - exchange server access does not exist at this time > - users use Outlook client at least v2003 > - I use site wide bayes > - I don't trust the users to feed bayes. > - I can't cc their Email on my server for later feeding. > - I want to use this process for corpus building, not daily > maintenance. > > My plan was: > - receive spam and ham (separately) "as attachments" form outlook > - extract attachments > - review attachments > - feed attachments to sa-learn > > Open for a better method..
An idea for an alternate collection method: run an imap server on your sa-learn training box, setup a second email account in Outlook for the users who are training, and have them just drag the ham/spam to training folders. I don't know if it's "better," but I'd prefer it myself to )re)training users to forward as attachment, then piecing things back together. If that's an option you'll pursue and you can use dovecot as your imap server, check out https://github.com/jnorell/train-spam-scanner as a training script. It's designed for exactly the goals you have in mind, ie. users supplying training messages which can be moderated and built into a corpus. -- Jesse Norell Kentec Communications, Inc. 970-522-8107 - www.kci.net