I'm interested in hearing about peoples experiences with spam filtering the spam emails that make it through to misc. Mostly non-english. I have been using SpamAssassin and training it, yet the bayes in default weightings are not enough to get the misc spams into my spam box... in fact many still autolearn as ham.
Email coming from the list server boosts the ham score. The locale plugin for SA doesnt help at all. I started working on something to check for word count % of words in an email, from /usr/share/dict/words to detect english-ness. It does work well but has it already been one elsewhere?