Ross, You're not the only one to run into the problem. I had converted out database over to UTF8 and didn't think anything about it until about a week later when the spam started pouring in. It still seemed to learn, it just couldn't select (i.e. compare).
I figured maybe it was the upgraded data. So we dumped the data and started again from scratch. It grew as expected but once it hit 200/200 we saw the same error. The calling server (it's a centralized database cluster) is set to default to UTF8. Everything on that backend server is also UTF8. We ended up dropping the database entirely and then recreating it with Latin. From there it seemed to work. I'm not sure what the overhead of converting it to Latin on a UTF8 configured MySQL instance but it seems to work. This is a bad work around at best though. There should be some type of flag that we can use in the SA configuration to specify character set in the connection string (or maybe a separate variable altogether). It seems to be ignoring the /etc/my.cnf variable "default-character-set=utf8" that the MySQL client should default to. So what we could be seeing isn't a spamassassin issue but a rather bad implementation of MySQL client library ignoring their own directives. But I'm by no means an expert on the subject... Gary Wayne Smith > -----Original Message----- > From: Ross Anderson [mailto:[EMAIL PROTECTED] > Sent: Thursday, July 20, 2006 8:58 PM > To: users@spamassassin.apache.org > Subject: bayes sql error. > > We've developed a problem recently with our mail server setup. I one > post that talked about mysql needing to have utf-8 character support. We > tried converting our db, and eventually just cleared out the db as a > test. When we cleared it out, it seemed to start returning more typical > responses. Now the error has returned after users have submitted thier > messages for training. Can anyone shed some light on this? thanks > > > > > [30023] dbg: bayes: corpus size: nspam = 698, nham = 219 > [30023] dbg: bayes: tok_get_all: token count: 67 > [30023] dbg: bayes: tok_get_all: SQL error: Illegal mix of collations > for operation ' IN ' > [30023] dbg: bayes: cannot use bayes on this message; none of the tokens > were found in the database > [30023] dbg: bayes: not scoring message, returning undef > > > > Ross