what format does sa-learn expect mail to be in.
I tried to feed it a 2meg standard unix email file of my spam folder and it only registered it as one email but with 4000 tokens...
Anyone... there is nothing in hte man [page about it
by default, sa-learn assumes that any files passed in each contian a single email in rfc 822 format. It assumes that any directories are maildirs.
You're probably passing a unix mailbox file, instead of a unix email file (note a mailbox is not an email, but contains several)
If you are passing unix mailbox files, you need --mbox as a parameter to sa-learn.
Some unix tools, such as UW imap, generate a variant mailbox format that is supported by using --mbx. If --mbox causes problems, try --mbx.
(note: despite using .mbx file extensions, mozilla appears to use mbox format, but it's not 100% clear if it's just a name change or if they really did change formats after 1.0)