Having gone over the FAQ and other doc-sections on the wiki, I haven't been
able to answer my questions. So here's hoping the user-community can help!
My company is currently using a home-brew solution for applying naive Bayes
filtering to data. Currently, what we're doing is basically spam filtering on
email messages that pass through our system. However, we have a need to do
filtering on other content, filtering that isn't the same as spam-testing. In a
nutshell, we currently use the "bogofilter" application to classify messages,
and invoke it with different word-list files to represent different filtering
requirements. But this isn't going to scale well for us as written, and I'm the
lucky soul tasked with coming up with a better way.
I'd like to adapt SA to this, if I can. I've used it in the past (and my ISP
for my personal email is fiercely loyal to it), but only ever for basic email
analysis. What I need, in this case, is a scalable Bayesian classifier. I see
from the docs that using SA will get me a usable client/server model, which
would take care of most of the scaling issues by making it easier for us to
move the classifier to a dedicated machine (if needed, or at least a
less-loaded one). What I *can't* puzzle out from the docs, is how to set up
such a daemon to do *only* the Bayes part, not the rest of the typical spam
checking (for one thing, these won't be email messages and thus will not have
any SMTP headers at all). Also, I (we) would need to be able to either have the
one daemon dynamically choose the database/word-list to use when judging a
message, or run multiple instances that each look at a different db/word-list.
Is this do-able with SA? I had hoped that there would be a more general
solution around bogofilter, either a client/server application pair or a more
API/library-based interface to calling it for training and for evaluation. But
there isn't (not that I can find, anyway). And SA is a system with a long
history and a solid code-base, so it seemed worthwhile to at least check and
see if this was possible.
Thanks in advance for any help, advice, etc.
Randy
--
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Randy J. Ray Oodle, Inc. http://www.oodle.com
rj...@corp.oodle.com