David Landgren writes:
> Bowie Bailey wrote:
> > [EMAIL PROTECTED] wrote:
> >> While I doubt it'd have quite the performance gains that A-C can
> >> offer, Regexp::Assemble certainly sounds like something worth
> >> trying... 
> >> the coderef trick, in particular, is very nifty.
> 
> Forgot to mention in the other thread I just replied to, if you've 
> downloaded the package, look at eg/ircwatcher for a slightly mindless 
> demo of the tracked mode. If you have a copy of O'Reilly's _Perl Hacks_, 
> a much more fleshed-out demo appears in there.
> 
> > It can work well.  After reading about it here, I tried it on one of
> > my programs that compares about 1600 words and phrases against a
> > document.  My scan time dropped by 30%.  This doesn't count the time
> > taken to assemble the regex (about .27 seconds), but since this
> > program runs as a daemon and only has to do the assembly once, it
> > wasn't relevant to me.
> 
> Here's some background that people may find interesting.
> 
> I have a Postfix access map that is an assembly of currently 4145 
> patterns, that correspond to residential broadband DNS names.
> 
> Patterns like
> 
>       \d+-\d+-\d+-\d+\.netabc\.com\.br
>       a\d+[abc]\d+\.neo\.lrun\.com
>       \d+\.\d+\.\d+\.\d+\.adsl\.abc\.tiscali\.dk
> 
> to match DNS names like
> 
>       host217-34-41-132.in-addr.btopenworld.com
>       dsl-200-67-157-162.prodigy.net.mx
>       host80-39.pool212171.interbusiness.it
>       bgp01069788bgs.vnburn01.mi.comcast.net
>       cpe-68-112-253-235.ma.charter.com
>       adsl-68-73-64-222.dsl.klmzmi.ameritech.net
> 
> At first this was to catch spam, now I'm happy that an unexpected 
> side-effect is that it discards connections during virus storms. I never 
> even accept the DATA, much less overload my AV scanner.
> 
> Anyway, when I started out, I noticed the performance of the Postfix 
> server dropping through the floor. So I wound up writing 
> Regexp::Assemble. Now, instead of going through a list of patterns, it 
> goes through one. (I had to recompile pcre and up the LINK_TYPE #define 
> so that pcre could compile the pattern).
> 
> Running a test on a small (1000) sample of host names speaks eloquently:
> 
> % perl5.9.4 racmp host.1k
> assembled 4145 patterns in 3.83324813842773 seconds
> R::A: good = 971, bad = 29 in 0.0148990154266357 seconds
> list: good = 971, bad = 29 in 5.72843599319458 seconds
>   A_C: good = 971, bad = 29 in 8.56000709533691 seconds
> RA  len: 87491
> A_C len: 174644
> 
> That is, the assembled approach runs in under 1/500th of the time of 
> looping through the list of REs. A_C is even worse, but given that the 
> pattern is over twice as long, and chock full of metacharacters I 
> suppose this shouldn't come as a surprise but it does seem odd. I'll 
> check back with Yves and see if my methodology is sane there.

"perl5.9.4" -- that's using bleadperl, right?

Can you post the output of "use re qw(debug); /....regexp..../"?  I'd
like to see what the structure looks like from the R::A regexp -- 500
times is a stunning speedup.

--j.

Reply via email to