David Landgren writes: > Bowie Bailey wrote: > > [EMAIL PROTECTED] wrote: > >> While I doubt it'd have quite the performance gains that A-C can > >> offer, Regexp::Assemble certainly sounds like something worth > >> trying... > >> the coderef trick, in particular, is very nifty. > > Forgot to mention in the other thread I just replied to, if you've > downloaded the package, look at eg/ircwatcher for a slightly mindless > demo of the tracked mode. If you have a copy of O'Reilly's _Perl Hacks_, > a much more fleshed-out demo appears in there. > > > It can work well. After reading about it here, I tried it on one of > > my programs that compares about 1600 words and phrases against a > > document. My scan time dropped by 30%. This doesn't count the time > > taken to assemble the regex (about .27 seconds), but since this > > program runs as a daemon and only has to do the assembly once, it > > wasn't relevant to me. > > Here's some background that people may find interesting. > > I have a Postfix access map that is an assembly of currently 4145 > patterns, that correspond to residential broadband DNS names. > > Patterns like > > \d+-\d+-\d+-\d+\.netabc\.com\.br > a\d+[abc]\d+\.neo\.lrun\.com > \d+\.\d+\.\d+\.\d+\.adsl\.abc\.tiscali\.dk > > to match DNS names like > > host217-34-41-132.in-addr.btopenworld.com > dsl-200-67-157-162.prodigy.net.mx > host80-39.pool212171.interbusiness.it > bgp01069788bgs.vnburn01.mi.comcast.net > cpe-68-112-253-235.ma.charter.com > adsl-68-73-64-222.dsl.klmzmi.ameritech.net > > At first this was to catch spam, now I'm happy that an unexpected > side-effect is that it discards connections during virus storms. I never > even accept the DATA, much less overload my AV scanner. > > Anyway, when I started out, I noticed the performance of the Postfix > server dropping through the floor. So I wound up writing > Regexp::Assemble. Now, instead of going through a list of patterns, it > goes through one. (I had to recompile pcre and up the LINK_TYPE #define > so that pcre could compile the pattern). > > Running a test on a small (1000) sample of host names speaks eloquently: > > % perl5.9.4 racmp host.1k > assembled 4145 patterns in 3.83324813842773 seconds > R::A: good = 971, bad = 29 in 0.0148990154266357 seconds > list: good = 971, bad = 29 in 5.72843599319458 seconds > A_C: good = 971, bad = 29 in 8.56000709533691 seconds > RA len: 87491 > A_C len: 174644 > > That is, the assembled approach runs in under 1/500th of the time of > looping through the list of REs. A_C is even worse, but given that the > pattern is over twice as long, and chock full of metacharacters I > suppose this shouldn't come as a surprise but it does seem odd. I'll > check back with Yves and see if my methodology is sane there.
"perl5.9.4" -- that's using bleadperl, right? Can you post the output of "use re qw(debug); /....regexp..../"? I'd like to see what the structure looks like from the R::A regexp -- 500 times is a stunning speedup. --j.