RE: HTML::TokeParser

Dan Muey Tue, 11 Feb 2003 14:23:45 -0800


> Dan Muey wrote:
> 
> > 
> >> 
> >> I am trying to use HTML::TokeParser
> >> From the cpan page for this I used this example :
> >>  
> >>                 while (my $token = $p->get_tag("a")) {
> >>                         my $url = $token->[1]{href} || "-";
> >>                         my $text = $p->get_trimmed_text("/a");
> >>                         print "$url\t$text \n";
> >>                 }
> >>  
> >> Worked great ::
> >> So I tried to do something similar with the img tag ::
> >> 
> >>                 while (my $token = $p->get_tag("img")) {
> >>                         my $src = $token->[1]{src} || "-";
> >>                         my $alt = $token->[1]{alt} || "-";
> >>                         print "$src\t$alt\n";
> >>                 }
> >>  
> >> and I get nothing, even thought I know there are lots of image tags
> > 
> > I tried commenting out the 'a'; version and the 'img' 
> version worked! 
> > So both chunks of code work they just don't work if you run 
> then back 
> > to back. I tried undef $token; I tried using different 
> names for the 
> > tokens ( $token and $tokenq respectively. I tried removing 
> 'my' from 
> > before $token and basically it seems that you can only get results 
> > from get_tag once.
> > 
> > Is there any way to reset this so that I can do both chunks of code 
> > above, one after the other, IE call $p->get_tag() more than once?
> > 
> 
> what happen is that when HTML::TokeParser parse the HTML 
> file, it reads the 
> file and return all the tokens or undef when the EOF is 
> reached. once you 
> reached EOF, any attempt to read the token again will fail 
> and that's why 
> your next while(...) loop will never return anything because 
> your first 
> while(...) loop eats up everything in the file already and 
> the file pointer 
> is pointed to EOF. the module does not have any way to reset to the 
> beginning of the file so you must catch all tokens (or tags in your 
> example) along the first parse:
> 
> #!/usr/bin/perl -w
> use strict;
> 
> use HTML::TokeParser;
> 
> my $tok = new HTML::TokeParser("index.html") || die $!; while(1){
>         my $token = $tok->get_tag("a","img");
>         last unless($token);
>         if($token->[0] eq 'a'){
>                 print $token->[1]{href} || "what?","\n";
>         }else{
>                 print $token->[1]{src} || "again?","\n";
>         }
> }
> 
That's exactly it, thanks you very much!
One more tiny little problem,


I have it grabbing the title, links and img tags perfectly except fot one minor snafu

It won't grab/parse img tags that are between <a> tags, IE an image that is a link.
I tried having it parse <img>'s first then <a>'s but that didn't work.
Any thoughts??
Thanks for all you r help!

Dan

> __END__
> 
> david
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: HTML::TokeParser

Reply via email to