I've searches low and high for answers to this problem, but I believe they answers out there don't have regular predictable keywords to find them. SA 3.0.1 Redhat FC2
In short, when I run sa-learn --dump, I see a slew of binary tokens. I've isolated the problem by creating a test directory, pointing sa-dump to it via --dbpath, and creating a new db. Even after loading only a single spam message, my db dump still shows all binary/useless tokens. It seems to be like sa-learn and my berkeley db version don't jive, perhaps? I don't seem to be getting any bayesian matching out of this in spamassassin, so I'm concluding it is a real issue and not just aesthetic. Sample output (mind you after loading only ONE 32-line/304-word spam message). (actual output 166 lines long. truncated...): # sa-learn --dbpath /tmp/bayes-testing/ --dump 0.000 0 3 0 non-token data: bayes db version 0.000 0 1 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 156 0 non-token data: ntokens 0.000 0 1098394307 0 non-token data: oldest atime 0.000 0 1098394307 0 non-token data: newest atime 0.000 0 0 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count 0.500 1 0 1098394307 146128b352 0.500 1 0 1098394307 4d8914a48a 0.500 1 0 1098394307 9b1dba02fa 0.500 1 0 1098394307 c6e33f2228 0.500 1 0 1098394307 e565aece1c 0.500 1 0 1098394307 e8778e7918 0.500 1 0 1098394307 0c90d22ab4 0.500 1 0 1098394307 948257a188 0.500 1 0 1098394307 e53979c58e 0.500 1 0 1098394307 da0dafd155 0.500 1 0 1098394307 6152cff59d 0.500 1 0 1098394307 801ee7924b Thanks in advance