On Wed, May 29, 2002 at 08:45:41PM +0200, Tony L. Svanstrom wrote: | On Wed, 29 May 2002 the voices made Kingsley G. Morse Jr. write: | > On Wed:11:43, Rob Winters wrote: | > [...] | > > SA does not give any credit to the cumulative effect | > [...] | > | > It seems to me that properly weighted scores would | > avoid this problem. I'd like to think that a good | > optimization algorithm, such as a genetic algorithm, | > could do the job. | | I don't agree with that; having rule A with rule B, C and D isn't | the same as having rule A and D... A and D could possibly be a | harmless result, while A and anything else could require a | diffdrent score for rule A.
Do A and D occur together? How often? In spam only? In non-spam only? In both? What about the other rules A is found together with? You need to have a good corpus from which to base the GA. That also implies that the scores are optimized for the type of mail that whoever builds the corpus receives. -D -- "...In the UNIX world, people tend to interpret `non-technical user' as meaning someone who's only ever written one device driver." --Daniel Pead GnuPG key : http://dman.ddts.net/~dman/public_key.gpg
msg05585/pgp00000.pgp
Description: PGP signature