Hello, David Jones [mailto:djo...@ena.com] wrote: > With non-English email flow, it's more challenging. If no RBLs hit, then you > really must train your Bayes properly which requires some way to accurately > determine the ham and spam. You must keep a copy of the ham and spam corpi and be allowed to review suspicious email.
I really appreciate you to take time helping on this. Yes, I can confirm that we usually have issues with Hungarian spams. English spams often caught by the default rules. As far as I understood today, I need to re-build the bayes database from scratch: 1. turn off autolearning 2. populate then spam database Guys behind the http://artinvoice.hu/spams/ site are doing an excellent work, they publish catched spams in mbox format I checked, many spam e-mails that was sent for investigation are in their mbox. 3. populate the ham database That's the tricky part. As I mentioned earlier, I don't really want end-users involved in this. And I don't have the necessary resource to do that manually. I assume I can hack something into the mailflow to copy all outgoing e-mails to a separate mailbox and - we'll assume that every outgoing e-mail are hams - these mails are learnt. That should do it? End-users are working in a heavily controlled environment (both technically and legally), in the last ten years, we haven't experienced spams that were sent from inside. That's why I would blindly trust outgoing emails as hams. One more question: is there a recommended ham to spam ratio? 1:1? I'm thinking about if you see my "populating the ham database automatically with the outgoing emails" idea as a complete nonsense, then I would find sysadministrator resource to collect 2000 legit emails and train those mails as hams, but cannot allocate 2 workhours/day for months. (Also I'm not sure if 2000 legit emails are enough for training) Best regards, Szabolcs Horvath