Hi all,

I'm trying to write a script to retrieve a list of URLs, search them for a 
list of words and print the results. I wrote it first as a standalone script 
and got it working, but when I converted it to a web page it no longer works 
properly.

The data is typed into two TEXTAREAs, one url or string to a line. The url of 
the script when called looks like this:
http://fw/cgi-bin/searchweb.pl?PAGES=http%3A%2F%2Fwww.colossalrecords.com.au%
2Fnewrelease-page.htm%0D%0Ahttp%3A%2F%2Fwww.bonzairecords.com%2Fcatalogue.htm
&STRINGS=Rush%0D%0AHold+It&Search=Search 

Since the data is "string\n\rstring" I figured I could use split /^/ to 
separate out the individual strings. There's probably an easier way to do it
though... the split seems to work correctly, and both @strings and %content 
get filled apparently ok.

The problem is that it only returns some of the results that it should. If a 
page only contains one of the strings, it will return none. If the page 
contains many of the strings, it will only return one. I have a 
feeling that it's something really obvious but I can't see what it is.

I've tried different parameters on the regex and adding a reset after each 
string to no effect.

Can anyone see what could be causing the problem?

Thanks for your help,
Len

Here's the script. Thanks to Curtis for adding use strict and my vars.

#!/usr/bin/perl -w
# search for each of a number of strings in a number of web pages
use strict;
use CGI;
require LWP::UserAgent;
 
my $q = new CGI;
 
my $textstr = $q->param('STRINGS');
my $pages = $q->param('PAGES');
my @strings;
my $i = 0;
my $ua = new LWP::UserAgent;
my %content;
 
print $q->header(-expires=>'-1d');
print <<EOH;
<html>
<title>Search results</title>
<body bgcolor=ffffff>
<h1>Search results</h1>
EOH
 
foreach my $line (split /^/, $textstr) {
    chomp $line;
    $strings[$i] = $line;
    $i++;
}
 
foreach my $line (split /^/, $pages) {
    chomp $line;
    my $request = new HTTP::Request(GET => $line);
    print "Loading $line<br>";
    my $response = $ua->request($request);
    if ($response->is_success) {
                $content{$line} = $response->content;
    } else {
                print "<b>Error: $line".$response->status_line."</b><br>";
    }
}

print "<br>Searching<br>";
 
foreach my $page (keys %content) {
    print $page."<br>";
    foreach $string (@strings) {
                # \Q deals with () in pattern
                if ($content{$page} =~ /\Q$string/g) {
                    print '<blockquote>'.$string.' found</blockquote>';
                }
            # worth a try
            reset; 
    }
}
 
print <<EOF;
</body>
</html>
EOF

Reply via email to