Module submission Mail::Classifier

Perl Authors Upload Server Tue, 14 Jan 2003 11:54:27 -0800

The following module was proposed for inclusion in the Module List:

  modid:       Mail::Classifier
  DSLIP:       adpOp
  description: Framework for statistical/bayesian mail sort
  userid:      DAGOLDEN (David Golden)
  chapterid:   19 (Mail_and_Usenet_News)
  communities:
    comp.lang.perl.modules; Mail::Box mailing list


  similar:
    Mail::SpamTest::Bayesian AI:Categorizer

  rationale:

    The goal of Mail::Classifier is to facilitate rapid creation,
    testing, and tuning of mail classification algorithms, such as the
    newly popularized Naive Bayesian methods. Unlike other modules, this
    module is both more specific around processing mail and more broad
    by providing utility routines useful for developers of new/hybrid
    approaches. Primary functionality of the abstract base class is mail
    box and message handling (leveraging capabilities of Mail::Box),
    data persistence, optional on-disk or in-memory data table storage,
    and statistical validation of classifier performance.

    Compared to Mail::SpamTest::Bayesian, Mail::Classifier is more
    general -- it is not restricted to spam identification nor the
    Bayesian approach. As a framework, it will support derived classes
    that can handle any number of categories of mail and which implement
    any of a number of algorithms or hybrids. (Indeed, one could write
    Mail::Classifier::SpamTest::Bayesian to use Mail::SpamTest::Bayesian
    as the implementation behind the scenes.)

    Compared to AI::Categorizer, Mail::Classifier is more specific to
    the task of processing e-mail, whereas AI::Categorizer is a broader
    package more "academic" in nature and jargon and more useful, in my
    opinion, for research than for people wanting to hack around with
    mail sorting because of its higher learning curve. Mail::Classifier
    is designed to be much simpler to use and extend. Derived classes
    need only implement the specific methods that are core to an
    algorithm (parse(), learn(), score(), and a few simple helper
    methods to manage algorithm-specific data). Included examples
    demonstrate both a trivial (near-random) method and the "Paul
    Graham" Bayesian approach as a jumping off point for people to
    develop and test their own ideas and methods.

  enteredby:   DAGOLDEN (David Golden)
  enteredon:   Tue Jan 14 19:53:08 2003 GMT

The resulting entry would be:

Mail::
::Classifier      adpOp Framework for statistical/bayesian mail sort DAGOLDEN


Thanks for registering,
The Pause Team

PS: The following links are only valid for module list maintainers:

Registration form with editing capabilities:
  
https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=6a100000_7439758662112850&SUBMIT_pause99_add_mod_preview=1
Immediate (one click) registration:
  
https://pause.perl.org/pause/authenquery?ACTION=add_mod&USERID=6a100000_7439758662112850&SUBMIT_pause99_add_mod_insertit=1

Module submission Mail::Classifier

Reply via email to