Hi all,
I'm trying to write a script to retrieve a list of URLs, search them for a
list of words and print the results. I wrote it first as a standalone script
and got it working, but when I converted it to a web page it no longer works
properly.
The data is typed into two TEXTAREAs, one url or string to a line. The url of
the script when called looks like this:
http://fw/cgi-bin/searchweb.pl?PAGES=http%3A%2F%2Fwww.colossalrecords.com.au%
2Fnewrelease-page.htm%0D%0Ahttp%3A%2F%2Fwww.bonzairecords.com%2Fcatalogue.htm
&STRINGS=Rush%0D%0AHold+It&Search=Search
Since the data is "string\n\rstring" I figured I could use split /^/ to
separate out the individual strings. There's probably an easier way to do it
though... the split seems to work correctly, and both @strings and %content
get filled apparently ok.
The problem is that it only returns some of the results that it should. If a
page only contains one of the strings, it will return none. If the page
contains many of the strings, it will only return one. I have a
feeling that it's something really obvious but I can't see what it is.
I've tried different parameters on the regex and adding a reset after each
string to no effect.
Can anyone see what could be causing the problem?
Thanks for your help,
Len
Here's the script. Thanks to Curtis for adding use strict and my vars.
#!/usr/bin/perl -w
# search for each of a number of strings in a number of web pages
use strict;
use CGI;
require LWP::UserAgent;
my $q = new CGI;
my $textstr = $q->param('STRINGS');
my $pages = $q->param('PAGES');
my @strings;
my $i = 0;
my $ua = new LWP::UserAgent;
my %content;
print $q->header(-expires=>'-1d');
print <<EOH;
<html>
<title>Search results</title>
<body bgcolor=ffffff>
<h1>Search results</h1>
EOH
foreach my $line (split /^/, $textstr) {
chomp $line;
$strings[$i] = $line;
$i++;
}
foreach my $line (split /^/, $pages) {
chomp $line;
my $request = new HTTP::Request(GET => $line);
print "Loading $line<br>";
my $response = $ua->request($request);
if ($response->is_success) {
$content{$line} = $response->content;
} else {
print "<b>Error: $line".$response->status_line."</b><br>";
}
}
print "<br>Searching<br>";
foreach my $page (keys %content) {
print $page."<br>";
foreach $string (@strings) {
# \Q deals with () in pattern
if ($content{$page} =~ /\Q$string/g) {
print '<blockquote>'.$string.' found</blockquote>';
}
# worth a try
reset;
}
}
print <<EOF;
</body>
</html>
EOF