On 4/17/2013 6:47 PM, Ben Johnson wrote:
> 
> 
> On 4/17/2013 5:05 PM, Kris Deugau wrote:
>> Ben Johnson wrote:
>>> Is there anything else that would cause Bayes tests not be performed? I
>>> ask because other types of tests are disabled automatically under
>>> certain circumstances (e.g., network tests), and I'm wondering if there
>>> is some obscure combination of factors that causes Bayes tests not to be
>>> performed.
>>
>> Do you have bayes_sql_override_username set?  (This forces use of a
>> single Bayes DB for all SA calls that reference this configuration file
>> set.)
>>
>> If not, you may be getting a Bayes DB for each user on your system;
>> IIRC this is supported (sort of) and default with Amavis.
>>
>> -kgd
>>
> 
> Thanks for jumping-in here, Kris.
> 
> Yes, I do have the following in my SA local.cf:
> 
> bayes_sql_override_username amavis
> 
> So, all users are sharing the same Bayes DB. I train Bayes daily and the
> token count, etc., etc. all look good and correct.
> 
> Just a quick update to my previous post.
> 
> The Pyzor and Razor2 score information is indeed coming through for the
> handful of messages that have landed since I made those configuration
> changes. So, all seems to be well on the Pyzor / Razor2 front.
> 
> However, I still don't see any evidence that Bayes testing was performed
> on the messages that are "slipping through".
> 
> It bears mention that *most* messages do indeed show evidence of Bayes
> scoring.
> 
> --- OH, SNAP! I found the root cause. ---
> 
> Well, when I went to confirm the above statement, regarding most
> messages showing evidence of Bayes scoring, I realized that *none* show
> evidence of it since 3/23! No wonder all of this garbage is slipping
> through!
> 
> I recognized the date 3/23 immediately; it was the date on which we
> upgraded ISPConfig from 3.0.4.6 to 3.0.5. (For those who have no
> knowledge of ISPConfig, it is basically a FOSS solution to managing vast
> numbers of websites, domains, mailboxes, etc., as the name implies.)
> 
> We also updated OS packages (security only) on that day.
> 
> After diff-ing all of the relevant service configuration files
> (amavis-new, spamassassin, postfix, dovecot, etc.) I couldn't find any
> discrepancies.
> 
> Then, I tried:
> 
> -----------------------------------------------------
> # spamassassin -D -t < /tmp/msg.txt 2>&1 | egrep '(bayes:|whitelist:|AWL)'
> 
> Apr 17 15:36:08.723 [23302] dbg: bayes: learner_new
> self=Mail::SpamAssassin::Plugin::Bayes=HASH(0x2fbc508),
> bayes_store_module=Mail::SpamAssassin::BayesStore::MySQL
> Apr 17 15:36:08.746 [23302] dbg: bayes: using username: amavis
> Apr 17 15:36:08.746 [23302] dbg: bayes: learner_new: got
> store=Mail::SpamAssassin::BayesStore::MySQL=HASH(0x3305358)
> Apr 17 15:36:08.758 [23302] dbg: bayes: database connection established
> Apr 17 15:36:08.758 [23302] dbg: bayes: found bayes db version 3
> Apr 17 15:36:08.759 [23302] dbg: bayes: Using userid: 1
> Apr 17 15:36:08.914 [23302] dbg: bayes: corpus size: nspam = 6083, nham
> = 2334
> Apr 17 15:36:08.920 [23302] dbg: bayes: tok_get_all: token count: 163
> Apr 17 15:36:08.921 [23302] dbg: bayes: tok_get_all: SQL error: Illegal
> mix of collations for operation ' IN '
> Apr 17 15:36:08.921 [23302] dbg: bayes: cannot use bayes on this
> message; none of the tokens were found in the database
> Apr 17 15:36:08.921 [23302] dbg: bayes: not scoring message, returning undef
> Apr 17 15:36:13.116 [23302] dbg: timing: total 5159 ms - init: 804
> (15.6%), parse: 10 (0.2%), extract_message_metadata: 99 (1.9%),
> poll_dns_idle: 3426 (66.4%), get_uri_detail_list: 0.24 (0.0%),
> tests_pri_-1000: 11 (0.2%), compile_gen: 133 (2.6%), compile_eval: 18
> (0.3%), tests_pri_-950: 5 (0.1%), tests_pri_-900: 5 (0.1%),
> tests_pri_-400: 12 (0.2%), check_bayes: 8 (0.1%), tests_pri_0: 804
> (15.6%), dkim_load_modules: 23 (0.4%), check_dkim_signature: 5 (0.1%),
> check_dkim_adsp: 99 (1.9%), check_spf: 61 (1.2%), check_razor2: 211
> (4.1%), check_pyzor: 138 (2.7%), tests_pri_500: 3387 (65.7%)
> -----------------------------------------------------
> 
> Check-out the message buried half-way down:
> 
> bayes: tok_get_all: SQL error: Illegal mix of collations for operation '
> IN '
> 
> I have run into this unsightly message before, but in that case, I could
> see the entire query, which enabled me to change the collations accordingly.
> 
> In this case, I have no idea what the original query might have been.
> 
> Further, I have no idea what changed that introduced this problems on 3/23.
> 
> Was it a MySQL upgrade? Was it an ISPConfig change?
> 
> Has anybody else run into this?
> 
> Thanks again,
> 
> -Ben
> 

I managed to fix this issue.

The date on which Bayes stopped "working" was relevant only in as much
it was the first date on which MySQL had been restarted in months. The
software updates had nothing to do with the issue. The critical change
was that sometime between the last MySQL start/stop, I had added a
handful of [mysqld] configuration directives to my.cnf:

default_storage_engine=InnoDB
skip-character-set-client-handshake
collation_server=utf8_unicode_ci
character_set_server=utf8

Searching the Web for "bayes: _put_token: Updated an unexpected number
of rows" yielded the following thread:

http://spamassassin.1065346.n5.nabble.com/Migrating-bayes-to-mysql-fails-with-parsing-errors-td10064i20.html

There were several clues here. After backing-up the token DB with
"sa-learn --backup", I dropped and recreated the tables using the schema
from
http://svn.apache.org/repos/asf/spamassassin/tags/spamassassin_current_release_3.3.x/sql/bayes_mysql.sql
.

Then, I tried restoring the data into the new tables:

# sa-learn --restore bayes-backup
bayes: encountered too many errors (20) while parsing token line,
reverting to empty database and exiting
ERROR: Bayes restore returned an error, please re-run with -D for more
information

# sa-learn -D --restore bayes-backup

dbg: bayes: _put_token: Updated an unexpected number of rows.
dbg: bayes: error inserting token for line: t 1 0 1364380202 3f3f1a2a3f
dbg: bayes: _put_token: Updated an unexpected number of rows.
dbg: bayes: error inserting token for line: t 1 0 1365878113 727f3f3f20

Still no joy. So I re-read the above thread, cover-to-cover.

The first post on that page was the key. In particular, adding the
following to each MySQL "CREATE TABLE" statement:

ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

To create the new tables, I performed a find-and-replace-all on
"TYPE=MyISAM;" by "ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;"
on the above .sql file. All the tables were created successfully.

To restore the data, I used "sa-learn --restore", and this time, the
process completed without error.

I was a little nervous that restoring from the backup would create
corrupted token data, and because I retain my spam and ham corpora, it
felt "safer" to empty the tables and re-run sa-learn on the corpora.

In any case, problem solved!

Thanks for the help and pointers in the right direction.

-Ben

Reply via email to