On 8 Dec 2003, Scott A Crosby wrote:

> On Mon, 08 Dec 2003 16:43:15 -0500, Matt Kettler <[EMAIL PROTECTED]> writes:
>
> > Or *, to catch more than one obfuscating character..
> >
> > ie: V...i..a.gr..a
> >
> > As I suggested in my email, there's lots of combinations that spammers
> > can do to avoid the original rule. There's also lots of ways to
> > construct the rule to get a broader hit-base, at the expense of
> > greater processing time.
>
> In theory, this isn't that much additional matching time, especially
> with an automata. In practice though, these sorts of rules will kill
> performance because Perl cannot apply the literal optimization,
> especially if they're applied widely. (There's more than just Vxxxxx
> -- most of the phrase rules need this sort of treatment.)
>
> Scott

Scott,
If it's a bounded wild-card (".{0,6}") as opposed to unbounded (".*")
is it less of a hit? (IE reasonable thing to do).

Are there any reasonably simple ways to do this with out killing things?

(EG .? == OK, .* == BAD, .{0,n} == acceptable, for small values of 'n')

Are there any studies of the Perl matching engine for efficiency
and rules-of-thumb?

Dave

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to