On Mon, Feb 20, 2023 at 01:30:15PM -0800, Loren Wilton wrote: > This is a home system with only a few users. All users have "Spam" and "Ham" > folders showing up in their email program of choice, and they just drag > messages they do or don't like into the appropriate folders. There are > "Oldham" > and "Oldspam" mboxes, and the new spam and ham (respectively) get merged into > these folders after learning, and removed from the current Spam and Ham > folders.
I had a similar idea but never implemmented it because I felt it was too difficult for users to deal with. I was considering 2 folders: 'Spam Training Set' and 'Ham Training Set' which would always represent the set of messages that Spamassassin was currently trained with. If you changed the contents of these mboxes, a cron job would delete the old bayes tokens and retrain with the current set. The difference between these folders and the Spam folder (or Junk or whatever you call it locally) is that messages older than 30 days get auto-deleted. After 30 days, those messages would no longer represent the training set. Having 2 spam folders is confusing and not easy to manage. Neither of these 2 extra folders are folders that users would look for messages so they really do have to copy messages into them which isn't just dragging them. That for me was the main issue I faced. So I abandoned this line of thinkinking. You mentioned harvesting ham and spam from mboxes as in from the inbox directly. This got me wondering more about this. Clearly using messages that the user dragged to Spam that spamassassin did not mark as Spam to train as spam. Easy. And use messages that the user left in their mailbox or deleted or archived as ham. Could be ok but less sure. And lastly, messages that were in Spam (since Spamassassin marked them as spam), that a user moved out of Spam. Just look through all their folders (except Spam) for messages that Spamassassin marked as spam and retrain on those as ham. Again, maybe a bad assumption, could work though. I was really just curious to know if other people had workable ideas how to get bayes trained with the least amount of friction.
signature.asc
Description: PGP signature