On Wed, May 29, 2002 at 08:45:41PM +0200, Tony L. Svanstrom wrote:
| On Wed, 29 May 2002 the voices made Kingsley G. Morse Jr. write:
| > On Wed:11:43, Rob Winters wrote:
| > [...]
| > > SA does not give any credit to the cumulative effect
| > [...]
| >
| > It seems to me that properly weighted scores would
| > avoid this problem. I'd like to think that a good
| > optimization algorithm, such as a genetic algorithm,
| > could do the job.
| 
|  I don't agree with that; having rule A with rule B, C and D isn't
|  the same as having rule A and D... A and D could possibly be a
|  harmless result, while A and anything else could require a
|  diffdrent score for rule A.

Do A and D occur together?  How often?  In spam only?  In non-spam
only?  In both?  What about the other rules A is found together with?

You need to have a good corpus from which to base the GA.  That also
implies that the scores are optimized for the type of mail that
whoever builds the corpus receives.

-D

-- 

"...In the UNIX world, people tend to interpret `non-technical user' as
meaning someone who's only ever written one device driver."
    --Daniel Pead
 
GnuPG key : http://dman.ddts.net/~dman/public_key.gpg

Attachment: msg05585/pgp00000.pgp
Description: PGP signature

Reply via email to