On Mon, 18 Feb 2002 at 22:44 GMT, Karl Kittler wrote: > I'm also trying to figure out how to collect both the URL and the link > name in one line of code. From what I've read, it looks like it can be > done.
Forget regexes, use a proper HTML parser. TokeParser is nice for this task #!/usr/bin/perl -w use strict; use HTML::TokeParser; my $p = HTML::TokeParser->new(\*DATA) or die "Cannot read DATA: $!"; my %links; while ( my $token = $p->get_tag('a') ){ my $url = $token->[1]{href}; my $text = $p->get_trimmed_text("/a"); $links{$url} = $text; } while ( my ($url, $text) = each %links ){ print "$url => $text\n"; } __DATA__ <html> <head><title>Test TokeParser</title> </head> <body> <a href="hello.html">Hello World!</a> <p>Some text <a href=goodbye.html>Goodbye cruel world</a> </p> <a style="font-size:18pt" href='foo.html'>Foo!</a><p> </body> </html> __END__ -- briac << dynamic .sig on strike, we apologize for the inconvenience >> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]