Re: Regexp compilation in mod_perl?

Len Walter Fri, 01 Jun 2001 17:58:45 -0700
Curtis,

thanks for your response. I didn't realise that the recompilation
problem didn't apply unless I 
used /o.

I am actually trying to split some multiline data into single lines, so
the caret is intentional.
The data is typed into a TEXTAREA, one url or string to a line. The url
of the script when called
looks like this:
http://fw/cgi-bin/searchweb.pl?PAGES=http%3A%2F%2Fwww.colossalrecords.com.au%2Fnewrelease-page.htm%0D%0Ahttp%3A%2F%2Fwww.bonzairecords.com%2Fcatalogue.htm&STRINGS=Rush%0D%0AHold+It&Search=Search
(it's for a DJ friend of mine).

Since the data is "string\n\rstring" I figured I could use split /^/ to
separate out the individual
strings. There's probably an easier way to do it though... the split
does work correctly, and both
@strings and %content get filled apparently ok.

The strange thing is that it returns some results but not others that it
should have.
For example, if I run it with PAGES=http://216.167.127.43/sall.htm
(appropriately escaped) and
STRINGS=Spirit%0D%0ANitro it returns a hit on Nitro (as it should) but
not on Spirit, which is
also on the page (warning: that url is a 6MB HTML file). In fact there
are about 20 strings in
STRINGS and although half a dozen of them are on that page, only the
last string in the list
returns a match. You can see why I thought it might have been a problem
with the regex being
cached in some way.

I've replaced the old script with the new one (less the escapes for the
carets) but it's still
behaving the same way.

I also tried making the /$pattern/ a /$pattern/g but no effect...

BTW, you mentioned that if I don't use strict, the %content will get
leaked. Does that mean that
use strict makes "my" vars pass out of scope after execution to be
garbage collected?

Thanks for your help,
Len

Curtis Poe wrote:
> 
> In the "mod_perl_traps" page, when it refers to regexes only being compiled once, it 
>is
> specifically referring to regexes that use the /o modifier:
> 
>     my $x =~ /$somevar/o;
> 
> In regular Perl (and mod_perl), that causes the pattern to only be compiled once.  
>If the value of
> $somevar changes, the regular expression will still try to match against the old 
>pattern.  This is
> a common problem.  In the mod_perl environment, the problem is that when using the 
>/o modifier,
> the regex is still only being compiled once and subsequent requests to your script 
>will still use
> the first regex pattern encountered, regardless of what you specify.  The 
>mod_perl_traps page
> offers strategies to avoid this.  Since you are not using the /o modifier, this 
>shouldn't apply to
> you.
> 
> I ran your script from the command line and it works fine.  However, I did notice 
>that you weren't
> using strict and this may be a source of some problems.  I am guessing that since 
>you didn't use
> it, your %content hash has old data hanging around in subsequent invocations of the 
>script.
> However, while this would be a memory leak, it shouldn't cause a problem.
> 
> My suspician is that your "splits" may be an issue:
> 
>     foreach my $line (split /^/, $textstr) {
> 
> Since the caret "^" in the first position of a regex is an anchor to the beginning 
>of the string,
> you are attempting to split on the beginning of the string.  If you must use the 
>caret as a
> delimeter, try escaping it in the regex:
> 
>     foreach my $line (split /\^/, $textstr) {
> 
> Here's an example of the problem (sorry, I'm on a Win32 system so my command line 
>perl looks
> funky):
> 
> C:\>perl -e "$x=q/a^b^c/;@x=split/^/,$x;print $x[0];"
> a^b^c
> 
> Notice that it wasn't split.  By escaping the caret, the split works fine:
> 
> C:\>perl -e "$x=q/a^b^c/;@x=split/\^/,$x;print $x[0];"
> a
> 
> If you have a caret in your params, this will cause your script to fail.
> 
> Hope this helps!
> 
> Cheers,
> Curtis Poe
> 
> PS:  Here's a corrected version of your script with "strict" added.
> 
> #!/usr/bin/perl -w
> # search for each of a number of strings in a number of web pages
> use strict;
> use CGI;
> require LWP::UserAgent;
> 
> my $q = new CGI;
> 
> my $textstr = $q->param('STRINGS');
> my $pages = $q->param('PAGES');
> my @strings;
> my $i = 0;
> my $ua = new LWP::UserAgent;
> my %content;
> 
> print $q->header(-expires=>'-1d');
> print <<EOH;
> <html>
> <title>Search results</title>
> <body bgcolor=ffffff>
> <h1>Search results</h1>
> EOH
> 
> foreach my $line (split /\^/, $textstr) {
>     chomp $line;
>     $strings[$i] = $line;
>     $i++;
> }
> 
> foreach my $line (split /\^/, $pages) {
>     chomp $line;
>     my $request = new HTTP::Request(GET => $line);
>     print "Loading $line<br>";
>     my $response = $ua->request($request);
>     if ($response->is_success) {
>                 $content{$line} = $response->content;
>     } else {
>                 print "<b>Error: $line".$response->status_line."</b><br>";
>     }
> }
> 
> print "<br>Searching<br>";
> 
> foreach my $page (keys %content) {
>     print $page."<br>";
>     for ($i=0; $i <= $#strings; $i++) {
>                 # \Q deals with () in pattern
>                 if ($content{$page} =~ /\Q$strings[$i]/) {
>                     print '<blockquote>'.$strings[$i].' found</blockquote>';
>                 }
>     }
> }
> 
> print <<EOF;
> </body>
> </html>
> EOF
> 
> =====
> Senior Programmer
> Onsite! Technology (http://www.onsitetech.com/)
> "Ovid" on http://www.perlmonks.org/
> 
> __________________________________________________
> Do You Yahoo!?
> Get personalized email addresses from Yahoo! Mail - only $35
> a year!  http://personal.mail.yahoo.com/
Re: Regexp compilation in mod_perl?

Reply via email to