On Wed, 10 Dec 2003, Gary Funck wrote:
> Soundex might be a practical solution. Perhaps a manageable approach
> is to first apply a spelling check using both a regular dictionary
> and augmenting it with a set of spammer mis-spellings. Then, send the
> output of that step into Soundex. The Soundex
Soundex might be a practical solution. Perhaps a manageable approach
is to first apply a spelling check using both a regular dictionary
and augmenting it with a set of spammer mis-spellings. Then, send the
output of that step into Soundex. The Soundex is a heuristic for catching
the creative alter
On Wed, 10 Dec 2003, Gary Funck wrote:
> > It might be convenient to view each these transformations as
> > operating on the output of the previous. I think you were.
> > By doing so, it avoids replicating the description of the
> > previous phase.
>
> I meant to add the following sugested additio
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Gary
> Funck
> Sent: Wednesday, December 10, 2003 1:09 PM
> To: [EMAIL PROTECTED]
> Subject: RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject
> rendering streams
>
> -Original Message-
> From: SpamTalk
> Sent: Wednesday, December 10, 2003 12:49 PM
>
> It would seem to me that, for purposes of rule simplification, that the
> subject and body of messages to be scanned should be available in
> pre-processed flavors, some of which is currently availabl
At 03:48 PM 12/10/2003, SpamTalk wrote:
FOLDED set all lowercase
Remove HTML
punctuation to be underscore,
Why on earth do you want to "set all lowercase"? Every regex in the ruleset
can be set to case sensitve or insensitve on it's own, so this adjustment
only m