> As I only want to know if the page is available for browsing,
> what are the
> result codes I should look for.
1)Explanation of response codes here:
http://kbs.cs.tu-berlin.de/~jutta/ht/responses.html
2)The problem is that you need to set the user agent in your code so that it
masquerades as a browser that the site understands. This will get rid of the
error code you are seeing at http://encarta.msn.es/
I have snipped from a program of mine to show you how this works. Note that
I also set a cookies file up so that I don't get rejected by sites that need
cookies.
3) To return the text of the page, look at the "content", ie print
$response->content;
(NOTE: I'm a perl newbie, so I'm sure there are more elegant ways to do
this!)
# Normal strictures to make me do a *little* better
use strict;
use warnings;
use diagnostics;
# LibWWW, used to actually get to the pages and check response and/or
content
use LWP::UserAgent;
# To handle sites that need cookies
use HTTP::Cookies;
# All da variables
my ($ua, $i, @site, $request, $response);
# Unbuffer output or else it gets real boring waiting for the whole
# program to finish before seeing any results
$|++;
# Create a new user agent that masquerades as a cookie eating MSIE 4
$ua = LWP::UserAgent->new;
$ua->cookie_jar(HTTP::Cookies->new(file => "lwpcookies.txt", autosave =>
1));
$ua->agent('Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)');
$ua->timeout(20);
# Sites I am checking
@site = qw
(
http://www.grn.es/
http://encarta.msn.es/
);
# for each site check for response and either print response or error
message
foreach $i (@site) {
$request = HTTP::Request->new('GET' => "$i");
$response = $ua->request($request);
if ($response->is_success) {
print $i, " ", $response->message, "\n";
} else {
print "Error: " . $response->status_line . "\n";
};
};