Dan Anderson wrote:
>
> I am trying to create a spider to grab my books off of Safari
> for a batch printing job so I don't need to go through each chapter
> myself and hit the Print button. So I used this script to try and log
> myself in to the safari site:
>
> # BEGIN CODE
> #! /usr/bin/perl
>
> use strict;
> use warnings;
> use LWP;
> use LWP::UserAgent;
Use one or the other, but not both. LWP is a module that just 'require's
LWP::UserAgent.
> # variables
> my $cookie_jar_file = "./cookies.txt";
> my @headers = (
> 'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
> 'Accept' => 'image/gif, image/x-bitmap, image/jpeg,
> image/pjpeg, image/png, */*',
> 'Accept-Charset' => 'iso-8859-1,*',
> 'Accept-Language' => 'en-US',
> "catid" => "",
> "s" => "1",
> "o" => "1",
> "b" => "1",
> "t" => "1",
> "f" => "1",
> "c" => "1",
> "u" => "1",
> "r" => "",
> "l" => "1",
> "g" => "",
> "usr" => "myemail",
> "pwd" => "mypassword",
> "savepwd" => "1",
> );
> # end variables
>
> my $user_agent = LWP::UserAgent->new;
> $user_agent->cookie_jar({file => $cookie_jar_file});
> my $response = $user_agent->post(
> 'http://safari.oreilly.com/JVXSL.asp',
> @headers,
> );
> # END CODE
>
> Now I know that this is the form I should post to because
> I stripped the following forms out of the web page (and there is
> no Javascript to modify the forms):
>
> <form action="JVXSL.asp" method="post">
> <input type="hidden" name="catid" value="">
> <input type="hidden" name="s" value="1">
> <input type="hidden" name="o" value="1">
> <input type="hidden" name="b" value="1">
> <input type="hidden" name="t" value="1">
> <input type="hidden" name="f" value="1">
> <input type="hidden" name="c" value="1">
> <input type="hidden" name="u" value="1">
> <input type="hidden" name="r" value="">
> <input type="hidden" name="l" value="1">
> <input type="hidden" name="g" value="">
> <input name="usr" type="text" value="" size="12">
> <input name="pwd" type="password" value="" size="12">
> <input type="checkbox" name="savepwd" value="1">
> <input type="image" name="Login" src="images/btn_login.gif" width="40" height="16"
> border="0" align="absmiddle">
> </form>
>
> When I pull up this web page there's nothing in
> $response->content. I know that safari.oreilly.com will return a
> blank page if it doesn't like the user agent, and upon signing in
> it'll return to the safari.oreilly.com page with a very large number
> of get variables. Does anyone know what I might be doing wrong?
You can't put form input into header fields! Use LWP to fetch the
Safari home page and HTML::Form to parse the form and enter
field values. None of the 'Accept' headers are necessary. Take a look
at this:
use strict;
use warnings;
use LWP;
use HTML::Form;
my $ua = new LWP::UserAgent(agent => 'Mozilla/4.76 [en] (Win98; U)');
$ua->cookie_jar({});
my $resp = $ua->get('http://safari.oreilly.com/');
die $resp->status_line unless $resp->is_success;
# There are two forms on the page. Find the one with an input named 'Login'.
#
my $login;
foreach (HTML::Form->parse($resp)) {
if ($_->find_input('Login')) {
$login = $_;
last;
}
}
$login->param('usr', '[EMAIL PROTECTED]');
$login->param('pwd', 'secret');
$resp = $ua->request($login->click);
die $resp->status_line unless $resp->is_success;
HTH,
Rob
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>