Hi, Please check my reply below. On Fri, May 3, 2013 at 12:59 PM, Edward and Erica Heim < edh...@bigpond.net.au> wrote:
> Hi all, > > I'm using LWP::UserAgent to access a website. One of the methods returns > HTML data e.g. > > my $data = $response->content; > > I.e. $data contains the HTML content. I want to be able to parse it line > by line e.g. > > foreach (split /pattern/, $data) { > my $line = $_; > ..... > > If I print $data, I can see the individual lines of the HTML data but I'm > not clear on the "pattern" that I should use in split or if there is a > better way to do this. > > What really are you splitting? And what exactly is the pattern you are using? > I understand that there are packages to parse HTML code but this is also a > learning exercise for me. > Please, don't parse HTML files with regexp. It's not that it can't be done or it hasn't been done, but it labor in futility. Rather learn modules like HTML::TreeBuilder and and rest from CPAN that can help do what you wanted. Secondly, parse the file first before "splitting". If I may, say one is to parse http://www.perl.org to print out the trimmed text on that web page. One can do like so: [CODE] #!/usr/bin/perl use warnings; use strict; use LWP::UserAgent; use HTML::TreeBuilder 5 -weak; ## url to get my $url = 'http://www.perl.org'; ## get the file my $ua = LWP::UserAgent->new; my $resp = $ua->request( HTTP::Request->new( GET => $url ) ); ## parse the HTML file my $tree = HTML::TreeBuilder->new; $tree->parse( $resp->decoded_content ); print $tree->as_trimmed_text; [/CODE] Hope this help somehow. > > Thanks in advance, Edward > > > > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- Tim