On May 3, 2013, at 4:59 AM, Edward and Erica Heim wrote: > Hi all, > > I'm using LWP::UserAgent to access a website. One of the methods returns > HTML data e.g. > > my $data = $response->content; > > I.e. $data contains the HTML content. I want to be able to parse it line by > line e.g. > > foreach (split /pattern/, $data) { > my $line = $_; > ...... > > If I print $data, I can see the individual lines of the HTML data but I'm not > clear on the "pattern" that I should use in split or if there is a better way > to do this.
If the lines are separated by new lines "\n", then the pattern is /\n/: for my $line ( split(/\n/,$data) ) { … The lines could also use carriage return - line feed: /\r\n/ (or is it /\n\r/?). The pattern /[\r\n]+/ will handle both but it will also gobble up blank lines -- two successive line ending characters or pairs of characters. > > I understand that there are packages to parse HTML code but this is also a > learning exercise for me. > I am currently using HTML::TokeParser to parse HTML files. It is pretty easy to use: use HTML::TokeParser; … my $parser = HTML::TokeParser->(\$data); # assuming $data contains the HTML text to be parsed while( my $token = $parser->get_token() ) { my $type = $token->[0]; if( $type eq 'S' ) { my $tag = $token->[1]; print "Start of tag $tag\n"; }elsif( $type eq 'E' ) { print "End of tag $token->[1]\n"; }elsif( $type eq 'T' ) { my $text = $token->[1]; print "Text: $text\n"; }elsif( $type eq 'C' ) { print "Comment: $text\n"; }elsif( $type eq 'D' ) { print "Declaration: $text\n"; }else{ print "Unknown type $type!!!\n" } } See 'perldoc HTML::TokeParser' for details. There are lots of other parsers out there. Some have special uses, like HTML::LinkExtor for extracting links, and HtmL::TableExtract for extracting information from HTML tables. Some modules, like HTML::TreeBuilder, build an in-memory model of the HTML page that you can traverse or search for information. Good luck. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/