Re: Perl Regular Expressions

Briac Pilpré Mon, 18 Feb 2002 15:10:08 -0800

On Mon, 18 Feb 2002 at 22:44 GMT, Karl Kittler wrote:
> I'm also trying to figure out how to collect both the URL and the link
> name in one line of code. From what I've read, it looks like it can be
> done.


 Forget regexes, use a proper HTML parser.
 TokeParser is nice for this task 

#!/usr/bin/perl -w
use strict;
use HTML::TokeParser;

my $p = HTML::TokeParser->new(\*DATA) or die "Cannot read DATA: $!";

my %links;

while ( my $token = $p->get_tag('a') ){
        my $url = $token->[1]{href};
        my $text = $p->get_trimmed_text("/a");
        $links{$url} = $text;
}

while ( my ($url, $text) = each %links ){
        print "$url => $text\n";
}

__DATA__
<html>
<head><title>Test TokeParser</title>
</head>
<body>
<a
  href="hello.html">Hello World!</a>
 <p>Some text <a href=goodbye.html>Goodbye cruel world</a> </p>
<a style="font-size:18pt" href='foo.html'>Foo!</a><p>
 </body>
</html>
__END__
-- 
briac
 << dynamic .sig on strike, we apologize for the inconvenience >>


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Perl Regular Expressions

Reply via email to