Having gone over the FAQ and other doc-sections on the wiki, I haven't been able to answer my questions. So here's hoping the user-community can help!

My company is currently using a home-brew solution for applying naive Bayes filtering to data. Currently, what we're doing is basically spam filtering on email messages that pass through our system. However, we have a need to do filtering on other content, filtering that isn't the same as spam-testing. In a nutshell, we currently use the "bogofilter" application to classify messages, and invoke it with different word-list files to represent different filtering requirements. But this isn't going to scale well for us as written, and I'm the lucky soul tasked with coming up with a better way.

I'd like to adapt SA to this, if I can. I've used it in the past (and my ISP for my personal email is fiercely loyal to it), but only ever for basic email analysis. What I need, in this case, is a scalable Bayesian classifier. I see from the docs that using SA will get me a usable client/server model, which would take care of most of the scaling issues by making it easier for us to move the classifier to a dedicated machine (if needed, or at least a less-loaded one). What I *can't* puzzle out from the docs, is how to set up such a daemon to do *only* the Bayes part, not the rest of the typical spam checking (for one thing, these won't be email messages and thus will not have any SMTP headers at all). Also, I (we) would need to be able to either have the one daemon dynamically choose the database/word-list to use when judging a message, or run multiple instances that each look at a different db/word-list.

Is this do-able with SA? I had hoped that there would be a more general solution around bogofilter, either a client/server application pair or a more API/library-based interface to calling it for training and for evaluation. But there isn't (not that I can find, anyway). And SA is a system with a long history and a solid code-base, so it seemed worthwhile to at least check and see if this was possible.

Thanks in advance for any help, advice, etc.

Randy
--
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Randy J. Ray          Oodle, Inc.                          http://www.oodle.com
rj...@corp.oodle.com

Reply via email to