The following module was proposed for inclusion in the Module List: modid: Mail::Classifier DSLIP: adpOp description: Framework for statistical/bayesian mail sort userid: DAGOLDEN (David Golden) chapterid: 19 (Mail_and_Usenet_News) communities: comp.lang.perl.modules; Mail::Box mailing list
similar: Mail::SpamTest::Bayesian AI:Categorizer rationale: The goal of Mail::Classifier is to facilitate rapid creation, testing, and tuning of mail classification algorithms, such as the newly popularized Naive Bayesian methods. Unlike other modules, this module is both more specific around processing mail and more broad by providing utility routines useful for developers of new/hybrid approaches. Primary functionality of the abstract base class is mail box and message handling (leveraging capabilities of Mail::Box), data persistence, optional on-disk or in-memory data table storage, and statistical validation of classifier performance. Compared to Mail::SpamTest::Bayesian, Mail::Classifier is more general -- it is not restricted to spam identification nor the Bayesian approach. As a framework, it will support derived classes that can handle any number of categories of mail and which implement any of a number of algorithms or hybrids. (Indeed, one could write Mail::Classifier::SpamTest::Bayesian to use Mail::SpamTest::Bayesian as the implementation behind the scenes.) Compared to AI::Categorizer, Mail::Classifier is more specific to the task of processing e-mail, whereas AI::Categorizer is a broader package more "academic" in nature and jargon and more useful, in my opinion, for research than for people wanting to hack around with mail sorting because of its higher learning curve. Mail::Classifier is designed to be much simpler to use and extend. Derived classes need only implement the specific methods that are core to an algorithm (parse(), learn(), score(), and a few simple helper methods to manage algorithm-specific data). Included examples demonstrate both a trivial (near-random) method and the "Paul Graham" Bayesian approach as a jumping off point for people to develop and test their own ideas and methods. enteredby: DAGOLDEN (David Golden) enteredon: Tue Jan 14 19:53:08 2003 GMT The resulting entry would be: Mail:: ::Classifier adpOp Framework for statistical/bayesian mail sort DAGOLDEN Thanks for registering, The Pause Team PS: The following links are only valid for module list maintainers: Registration form with editing capabilities: https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=6a100000_7439758662112850&SUBMIT_pause99_add_mod_preview=1 Immediate (one click) registration: https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=6a100000_7439758662112850&SUBMIT_pause99_add_mod_insertit=1