RE: HTML::TokeParser

david Tue, 11 Feb 2003 11:58:57 -0800

Dan Muey wrote:

> 
>> 
>> I am trying to use HTML::TokeParser
>> From the cpan page for this I used this example :
>>  
>>                 while (my $token = $p->get_tag("a")) {
>>                         my $url = $token->[1]{href} || "-";
>>                         my $text = $p->get_trimmed_text("/a");
>>                         print "$url\t$text \n";
>>                 }
>>  
>> Worked great ::
>> So I tried to do something similar with the img tag ::
>> 
>>                 while (my $token = $p->get_tag("img")) {
>>                         my $src = $token->[1]{src} || "-";
>>                         my $alt = $token->[1]{alt} || "-";
>>                         print "$src\t$alt\n";
>>                 }
>>  
>> and I get nothing, even thought I know there are lots of image tags
> 
> I tried commenting out the 'a'; version and the 'img' version worked!
> So both chunks of code work they just don't work if you run then back to
> back. I tried undef $token;
> I tried using different names for the tokens ( $token and $tokenq
> respectively. I tried removing 'my' from before $token and basically it
> seems that you can only get results from get_tag once.
> 
> Is there any way to reset this so that I can do both chunks of code above,
> one after the other, IE call $p->get_tag() more than once?
>


what happen is that when HTML::TokeParser parse the HTML file, it reads the 
file and return all the tokens or undef when the EOF is reached. once you 
reached EOF, any attempt to read the token again will fail and that's why 
your next while(...) loop will never return anything because your first 
while(...) loop eats up everything in the file already and the file pointer 
is pointed to EOF. the module does not have any way to reset to the 
beginning of the file so you must catch all tokens (or tags in your 
example) along the first parse:

#!/usr/bin/perl -w
use strict;

use HTML::TokeParser;

my $tok = new HTML::TokeParser("index.html") || die $!;
while(1){
        my $token = $tok->get_tag("a","img");
        last unless($token);
        if($token->[0] eq 'a'){
                print $token->[1]{href} || "what?","\n";
        }else{
                print $token->[1]{src} || "again?","\n";
        }
}

__END__

david

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: HTML::TokeParser

Reply via email to