Hi,
First of all, thanks to Justin for patiently helping me to install
mass-check and pointing me in the right direction. I will try to run the
algorithms tonight to see what they come up with.
In the meantime, you can find a hit-frequencies report at:
http://www.saphirtech.fr/spam/freqs_2008_06_23.txt
All rules are prefixed with FR_ and are available in the same directory.
I must say I did not double check for stray spam in my mailbox before
using it as a ham corpus but it *should* be clean. I'll double check for
next run. The spam corpus was 100% French spam, hand-picked over the last
week through the "probably-spam" class (default score values 5-15).
Any feedback on the results (not enough in corpus, bad rules, good rules,
etc.) appreciated.
Sincerely,
JG