Hi,
I am Jacopo Fabiani, a Computer Science student of Pisa.
I'm trying to get spam/ham tokens stored in Department's spamassasin
database which should be useful to create a query classifier.
I got a dump of database with the command sa-learn --backup but I have some
problem to encode the token's column.
Below you can see what I get:
v    3    db_version # this must be the first line!!!
v    142549    num_spam
v    66900    num_nonspam
t    29875    17211    1335967225    2dd27dc5f9
t    1573    2752    1335249870    c0614089c0

I think the last column should contain the token. First, I tried to convert
from hex to ascii but I didn't solve the problem.

Then I looked inside the code of backup_database() function inside
Mail::SpamAssassin::BayesStore::BDB and I found that tokens are encoded
with unpack function:

my $encoded = unpack("H*", $token);

So, after I looked inside restore_database() function, I tried to do the
inverse process using the pack function but it does not solve the problem:

$token = pack("H*",$encoded);

print $token; <-- it prints a non-sense value

also I tried to use sha1 function:

$token = substr(sha1($encoded), -5);
print $token; <-- it prints a non-sense value

My question is: where do I get wrong? Is there a way to decode encoded
token that I got with sa-learn --backup command?

Best Regards,
Jacopo.

Reply via email to