Hi,
for a test, I have increased the column length of token to binary(32)
and used a test file to import containing a single token.
This time it went through. However, as I suspected, the token length is
not 5 byte. Token line from backup:
t 1 0 1718024618 027121926a
Hex representation of content in database:
MariaDB [spamassassin]> select hex(token) from bayes_token\G
*************************** 1. row ***************************
hex(token): 027121C2926A0000000000000000000000000000000000000000000000000000
1 row in set (0.000 sec)
Compared:
Original 02 71 21 92 6a
Database 02 71 21 C2 92 6A
C2 92 is the UTF-8 encoding of U+0092, thus basically the token is
written in UTF-8 into the database.
Running sa-learn with DBI_TRACE=2 I can also see that it looks like it
actually has the UTF-8 encoding already in there during parameter binding:
Binding parameters: INSERT INTO bayes_token
(id, token, spam_count, ham_count, atime)
VALUES ('43','^Bq!<U+0092>j','1','0','1718024618')
ON DUPLICATE KEY UPDATE spam_count = GREATEST(spam_count
+ '1', 0),
ham_count = GREATEST(ham_count +
'0', 0),
atime = GREATEST(atime,
'1718024618')
Thus, I would say it's not an issue with the database.
Any idea?
Running spamassassin-3.4.6-5.el9.x86_64 on AlmaLinux 9.4.
Thanks,
Gerald
On 18.06.24 17:09, Gerald Vogt wrote:
Hi!
I am trying to use a mariadb database as bayesstore, but it fails to
load tokens. Whenever it tries to insert something into bayes_token it
fails with an error
dbg: bayes: _put_token: SQL error: Data too long for column 'token' at
row 1
The table has been created as mentioned in
https://github.com/apache/spamassassin/blob/trunk/sql/bayes_mysql.sql
but the 5 byte binary isn't big enough. I have tried with sa-learn
--restore as well as learning some spam mails. bayes_token remains empty.
MariaDB [spamassassin]> show create table bayes_token\G
*************************** 1. row ***************************
Table: bayes_token
Create Table: CREATE TABLE `bayes_token` (
`id` int(11) NOT NULL DEFAULT 0,
`token` binary(5) NOT NULL,
`spam_count` int(11) NOT NULL DEFAULT 0,
`ham_count` int(11) NOT NULL DEFAULT 0,
`atime` int(11) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`,`token`),
KEY `bayes_token_idx1` (`id`,`atime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
1 row in set (0.000 sec)
Any idea what goes wrong here?
Thanks,
Gerald