On Tue, 31 Mar 2009, Lucio Chiappetti wrote:

users MAY forward spam which goes through to an area where a daily crontab picks it up for sa-learn ... we've been happy with the entire arrangement since a couple of years)

Do you retain those messages? If not, you have no way to review how SA has been manually trained.

What looks suspicious to me is BAYES_00. Most other spam has BAYES_99.

Indeed. That is likely the primary cause of your problem.

and a slightly variable text in (bad) italian (with spelling and grammar errors) stating that "80% of the people in your [country|city|region...] is unhappy with their monthly income" and offering a job for internet advertising.

That should be excellent bayes fodder.

0.000          0      31125          0  non-token data: nspam
0.000          0     239162          0  non-token data: nham
0.000          0     310271          0  non-token data: ntokens

I'm not sure how to interpret those numbers.

Your bayes is trained with a strong bias towards ham. It should be more the other way, since the raw volume of email is biased towards spam.

But then what is the best way to force bayes to "change its mind" from 00 to 99 (or at least above 50) on this sort of spam, other than waiting it catches up on the few user submissions (myself, I won't be doing other submission since my procmail filter diverts them to /dev/null) ?

If you've kept your training corpus, review it for the mislearned messages and relearn them as spam.

I suggest you also consider either disabling autolearn, or push the learn-as-ham threshold lower.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #6: If you can choose what to bring to a
  gunfight, bring a long gun and a friend with a long gun.
-----------------------------------------------------------------------
 Tomorrow: April Fools' day

Reply via email to