Hi, I am Jacopo Fabiani, a Computer Science student of Pisa. I'm trying to get spam/ham tokens stored in Department's spamassasin database which should be useful to create a query classifier. I got a dump of database with the command sa-learn --backup but I have some problem to encode the token's column. Below you can see what I get: v 3 db_version # this must be the first line!!! v 142549 num_spam v 66900 num_nonspam t 29875 17211 1335967225 2dd27dc5f9 t 1573 2752 1335249870 c0614089c0
I think the last column should contain the token. First, I tried to convert from hex to ascii but I didn't solve the problem. Then I looked inside the code of backup_database() function inside Mail::SpamAssassin::BayesStore::BDB and I found that tokens are encoded with unpack function: my $encoded = unpack("H*", $token); So, after I looked inside restore_database() function, I tried to do the inverse process using the pack function but it does not solve the problem: $token = pack("H*",$encoded); print $token; <-- it prints a non-sense value also I tried to use sha1 function: $token = substr(sha1($encoded), -5); print $token; <-- it prints a non-sense value My question is: where do I get wrong? Is there a way to decode encoded token that I got with sa-learn --backup command? Best Regards, Jacopo.