Hi Ralf,

Ralf Hildebrandt wrote:

Although our SA setup works very well in general, one issue that has come up a few times recently is airline E-tickets/reservations. These tend to be ALL CAPS and have quite a few other trigger words. Our company seems to do business with more than one travel-agent, so just whitelisting isn't quite enough. These mails hit the following rules:

X-Spam-Score: ***** (5.696) BAYES_99,HTML_30_40,HTML_MESSAGE,NO_REAL_NAME,
 SARE_OBFU_TBL_03,UPPERCASE_50_75,autolearn=no

You could feed these to the bayes DB as "ham"

You are right, of course. But Bayes is more of a statistical tool, and given the total number of mails stored in Bayes already, I fear it will take quite a bit of learning to offset the current high scoring.

Our current Bayes setup is:
Company-wide Bayes database
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam -0.1
score BAYES_99 5.0
score BAYES_95 4.0

Perhaps I should lower my BAYES_99 and BAYES_95 a bit, though these settings are based on past experience where Bayes alone was not able to put clearly spammy mails over the threshold.

These E-tickets just look terribly spammy to Bayes because of the languaged used, it seems. Some high-scoring words for this one are:

bayes token 'visa' => 0.997839158297152
bayes token 'refund' => 0.997646909307943
bayes token 'drinks' => 0.997585038685398
bayes token 'NUMBER' => 0.990398319296953
bayes token 'nights' => 0.98853871069642

Regards, Paul Boven.

Reply via email to