On Wed, 9 May 2018, Matthew Broadhead wrote:
[root@ns1 ~]# sudo -H -u amavis bash -c '/usr/bin/sa-learn --dump magic'
0.000 0 3 0 non-token data: bayes db version
0.000 0 32225 0 non-token data: nspam
0.000 0 440420 0 non-token data: nham
So you have a bunch of stuff trained, biased towards ham.
(3)
your message has
X-Spam-Status: No, score=-18.15 tagged_above=-999 required=6.2
tests=[AM.WBL=-3, BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25,
BAYES_00 - bayes *is* working and *is* seeing the trained data.
Can you provide the X-Spam-Status from an obvious spam that got through,
for comparison?
MAILING_LIST_MULTI=-1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001,
URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5]
Side note: you will want to set up a local recursive (NON-FORWARDING!!!)
DNS server for the MTA's use so you avoid the URIBL_BLOCKED issue. That
will help quite a lot.
autolearn=ham autolearn_force=no
(4)
around 50 users. they are all working in same industry
OK, that's small enough that manual training should not be an issue.
Speculation:
Autotrain has stongly biased your database towards ham.
I *assume* you didn't collect a manual initial training corpus, that you
just turned on autotrain and let it run from scratch, and that you have no
manual corpus available to evaluate and verify the ham/spam
classification.
Recommendation:
(1) Turn off autotrain and autoexpire
(2) Collect and manually review several hundred ham and spam messages and
do initial retraining from scratch using them
(3) Review Bayes performance
(4) Going forward, train using misses (e.g. a spam with BAYES < 50, or a
ham with BAYES > 50) - add them to your retained training corpus
You may be able to recruit some clueful, responsible users to help with
the training, but make sure you review what they submit unless you
*really* trust their judgement.
On 08/05/18 21:08, John Hardin wrote:
On Tue, 8 May 2018, Matthew Broadhead wrote:
system setup centos-release-7-4.1708.el7.centos.x86_64,
spamassassin-3.4.0-2.el7.x86_64, amavisd-new-2.11.0-3.el7.noarch
/etc/mail/spamassassin/local.cf:
required_hits 5
report_safe 0
rewrite_header Subject [SPAM]
use_bayes 1
bayes_auto_learn 1
bayes_auto_expire 1
# Store bayesian data in MySQL
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:sa_bayes:localhost:3306
it is storing the info to the database ok. but it doesn't seem to be
filtering any mail.
(1) What is the output of: /usr/bin/sa-learn --dump magic
(2) What user are you running sa-learn as for training, and what user is
spamd running as?
(3) Are you seeing any BAYES_nn rule hits on messages at all, on either ham
or spam?
(4) How large is your environment (rough # and diversity of users)?
I'm not familiar with SQL Bayes, others may have other
questions/recommendations.
Some general comments:
I don't recommend using auto-learn for initial bayes training at least,
particularly in smaller environments. Manual initial training with careful
review, followed by manual training of misclassifications after review, is
more reliable. Others may offer different advice, particularly for large
installs with a diverse user community (which I don't manage).
Always keep your training corpora so that you can review and fix training
errors, and wipe and retrain from scratch if Bayes goes completely off the
rails for some reason.
If you're not auto-learning, auto-expire is not needed. If you *are*, it's
recommended to expire from a scheduled job rather than take the hit from
spamd.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Your mouse has moved. Your Windows Operating System must be
relicensed due to this hardware change. Please contact Microsoft
to obtain a new activation key. If this hardware change results in
added functionality you may be subject to additional license fees.
Your system will now shut down. Thank you for choosing Microsoft.
-----------------------------------------------------------------------
405 days since the first commercial re-flight of an orbital booster (SpaceX)