Re: sa-learn dump showing only binary tokens

Michael Parker 29 Oct 2004 19:58:48 -0000

On Fri, Oct 29, 2004 at 03:36:26PM -0400, [EMAIL PROTECTED] wrote:
> 
> In short, when I run sa-learn --dump, I see a slew of binary tokens.  I've 
> isolated the problem by creating a test directory, pointing sa-dump to it via 
> --dbpath, and creating a new db.  Even after loading only a single spam 
> message, my db dump still shows all binary/useless tokens.  It seems to be 
> like sa-learn and my berkeley db version don't jive, perhaps?  I don't seem 
> to be getting any bayesian matching out of this in spamassassin, so I'm 
> concluding it is a real issue and not just aesthetic.  Sample output (mind 
> you after loading only ONE 32-line/304-word spam message).
>


It's just aesthetic.  All bayes tokens are stored in binary form now.
For the curious, it's the low order 40 bits of the tokens SHA1 hash.

Because these values are binary they wouldn't print very well in the
sa-learn --dump output, so when you see them here, in the --restore
output or in the bayes_journal file they are actually unpacked values
of the binary token.  They are then repacked before going into the
database.

So, yes you are seeing binary tokens in your database, no they are not
necessarily what you are seeing the dump output (this is basically a
hex representation of the binary token value).  If you need access to
the raw token value you can write a plugin to dump the values from the
bayes hooks.

Now, I'm not saying you don't have a problem with bayes, you did say
it seems to not be working.  If you run with -D you will see some
bayes debug output that might lead to why it isn't working, assuming
it isn't.

Michael

pgppJmHsDDwiB.pgp
Description: PGP signature

Re: sa-learn dump showing only binary tokens

Reply via email to