Dan Muey wrote: > whatever is inbetween the <a tags. > > I winder if it's possible to do some thing like this : > > if($token->[0] eq 'a'){ > print $token->[1]{href} || "what?","\n"; > my $link_guts = $tok->get_trimmed_text("/a"); > > and then some how grab the 'src' and 'alt' attributes from each img tag in > $link_guts if it's an image and the regular text if it's not and probably > all three if it has an img's and text >
that's why parsing HTML is tricky and XML is on the way to rescue. is you use get_token() instead of get_tag(), it might be easier. get_token() return for all token and it will be the programmer's responsibility to use the token. get_tag() eats up the tokens you don't want so it's tricky: #!/usr/bin/perl -w use strict; use HTML::TokeParser; my $tok = new HTML::TokeParser(*DATA) || die $!; while(1){ my $token = $tok->get_token(); last unless($token); if($token->[0] eq 'T'){ print "Text: $token->[1]\n" if($token->[1] =~ /\S/); }elsif($token->[0] eq 'S' && $token->[1] eq 'img'){ print "IMG $token->[2]{src}\n"; }elsif($token->[0] eq 'S' && $token->[1] eq 'a'){ print "LINK $token->[2]{href}\n"; } } __END__ all tokens are returned to you no matter where they are so <img> within <a>, <a> within <img>, <a> within <a>, etc will all be returned to you. if you add a little bit more logic, it's easy to find all nesting tags... david -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]