RE: stripping web pages

Beau E. Cox Sat, 26 Oct 2002 14:57:00 -0700

Hi -

I think you want something like this:


#!/usr/bin/perl
#
#       getwebpage - simple web page get.
#
#   Beau E, Cox
#   October 26, 2002
#   <[EMAIL PROTECTED]><http://beaucox.com>
#

use strict;
use warnings;
use LWP::UserAgent;
use HTML::TokeParser;

        my $url = "http://cpan.org";;

        my $agent = new LWP::UserAgent ();

#   put yout proxy here if necessary
#       $agent->proxy (['http'] => 'whatever');

        my $request = new HTTP::Request ('GET' => $url);
        my $response = $agent->request ($request);

        if ($response->is_success ()) {

                my $document = $response->content ();

#   web page is now in document

        print "$document\n";

#   if you need something more fancy - look into:

#       my $scontent;
#               my $page = new HTML::TokeParser (\$document);
#               while (my $token = $page->get_token ()) {
#                       my $type = shift (@{$token});
#                       $_ = shift (@{$token});
#                       if ($type eq "T") {
#                               $scontent .= $_;
#                               }
#                       }

                }
    else {
        print "error ", $response->code(),
            " getting $url\n", $response->message(), "\n";
        }

Refer to the documentation in LWP::UserAgent, LWP, and,
if you want fancy, HTML::TokenParser. Have fun!

Aloha => Beau.

PS: I tested the above - OK.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:Steven_Massey@;notes.amdahl.com]
Sent: Saturday, October 26, 2002 8:14 AM
To: [EMAIL PROTECTED]
Subject: stripping web pages


Hi

I  have perl scripts that I have built over the last 6 months( with MUCH
help from this list), basically I save a web page as a text file and
process this and dump into mysql

Is it possible to to process the web page directly ?? straight into mysql -
without the need to dump to text and re-read

I have looked around cpan/google but don't really know what I'm after

Any ideas ?? modules, methods I can investigate ??

hears hoping

thanks
steve




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: stripping web pages

Reply via email to