Hi Ralf,
Ralf Hildebrandt wrote:
Although our SA setup works very well in general, one issue that has
come up a few times recently is airline E-tickets/reservations. These
tend to be ALL CAPS and have quite a few other trigger words. Our
company seems to do business with more than one travel-agent, so just
whitelisting isn't quite enough. These mails hit the following rules:
X-Spam-Score: ***** (5.696) BAYES_99,HTML_30_40,HTML_MESSAGE,NO_REAL_NAME,
SARE_OBFU_TBL_03,UPPERCASE_50_75,autolearn=no
You could feed these to the bayes DB as "ham"
You are right, of course. But Bayes is more of a statistical tool, and
given the total number of mails stored in Bayes already, I fear it will
take quite a bit of learning to offset the current high scoring.
Our current Bayes setup is:
Company-wide Bayes database
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam -0.1
score BAYES_99 5.0
score BAYES_95 4.0
Perhaps I should lower my BAYES_99 and BAYES_95 a bit, though these
settings are based on past experience where Bayes alone was not able to
put clearly spammy mails over the threshold.
These E-tickets just look terribly spammy to Bayes because of the
languaged used, it seems. Some high-scoring words for this one are:
bayes token 'visa' => 0.997839158297152
bayes token 'refund' => 0.997646909307943
bayes token 'drinks' => 0.997585038685398
bayes token 'NUMBER' => 0.990398319296953
bayes token 'nights' => 0.98853871069642
Regards, Paul Boven.