Hi,
Please check my reply below.

On Fri, May 3, 2013 at 12:59 PM, Edward and Erica Heim <
edh...@bigpond.net.au> wrote:

> Hi all,
>
> I'm using  LWP::UserAgent to access a website. One of the methods returns
> HTML data e.g.
>
> my $data = $response->content;
>
> I.e. $data contains the HTML content. I want to be able to parse it line
> by line e.g.
>
> foreach (split /pattern/, $data) {
>     my $line = $_;
> .....
>
> If I print $data, I can see the individual lines of the HTML data but I'm
> not clear on the "pattern" that I should use in split or if there is a
> better way to do this.
>
>     What really are you splitting? And what exactly is the pattern you are
using?


> I understand that there are packages to parse HTML code but this is also a
> learning exercise for me.
>

    Please, don't parse HTML files with regexp. It's not that it can't be
done or it  hasn't been done, but it labor in futility. Rather learn
modules like HTML::TreeBuilder and and rest from CPAN that can help do what
you wanted.

Secondly, parse the file first before "splitting".

 If I may, say one is to parse http://www.perl.org to print out the trimmed
text on that web page. One can do like so:

[CODE]

#!/usr/bin/perl
use warnings;
use strict;
use LWP::UserAgent;
use HTML::TreeBuilder 5 -weak;

## url to get
my $url = 'http://www.perl.org';

## get the file
my $ua = LWP::UserAgent->new;
my $resp = $ua->request( HTTP::Request->new( GET => $url ) );

## parse the HTML file
my $tree = HTML::TreeBuilder->new;
$tree->parse( $resp->decoded_content );
print $tree->as_trimmed_text;

[/CODE]
Hope this help somehow.

>
> Thanks in advance, Edward
>
>
>
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


-- 
Tim

Reply via email to