spamassassin tokenization

qian diao Thu, 24 Oct 2013 14:17:54 -0700

Hi,
I am trying to use spamassasin tokenization result on some other machine 
learning methods, such as SVM, etc. The results from "sa-learn --dump" are 
token frequency in all ham or spam messages, and not on a per-message basis. 
The token counts I want is like the following format:









Tokens          msg0          msg1          ...  msgM
token1          10      6          ...  0
......
tokenN          20     1          ...  2 

If the data on a per-message basis is not available in current design, is there 
any ways to use spamassasin to do the tokenization only, then use my own 
statistical model for the classification?
Thanks,Qian

spamassassin tokenization

Reply via email to