Re: HTML::TokeParser

Rob Dixon Tue, 11 Feb 2003 12:40:23 -0800

Dan Muey wrote:
>> I am trying to use HTML::TokeParser
>> From the cpan page for this I used this example :
>>
>>                 while (my $token = $p->get_tag("a")) {
>>                         my $url = $token->[1]{href} || "-";
>>                         my $text = $p->get_trimmed_text("/a");
>>                         print "$url\t$text \n";
>>                 }
>>
>> Worked great ::
>> So I tried to do something similar with the img tag ::
>>
>>                 while (my $token = $p->get_tag("img")) {
>>                         my $src = $token->[1]{src} || "-";
>>                         my $alt = $token->[1]{alt} || "-";
>>                         print "$src\t$alt\n";
>>                 }
>>
>> and I get nothing, even thought I know there are lots of image tags
>
> I tried commenting out the 'a'; version and the 'img' version worked!
> So both chunks of code work they just don't work if you run then back
> to back.
> I tried undef $token;
> I tried using different names for the tokens ( $token and $tokenq
> respectively.
> I tried removing 'my' from before $token and basically it seems that
> you can only get results from get_tag once.
>
> Is there any way to reset this so that I can do both chunks of code
> above,
> one after the other, IE call $p->get_tag() more than once?


Hi Dan.

The HTML::TokeParser constructor will take a filehandle as its
parameter, so you can do this:

    use strict;
    use warnings;
    use HTML::TokeParser;
    use Fcntl qw(:seek);    # to import the SEEK_SET constant

    open my $html, '<', 'sample.htm' or die $!;

    my $p = new HTML::TokeParser($html);
    while (my $token = $p->get_tag("a")) {
        my $url = $token->[1]{href} || "-";
        my $text = $p->get_trimmed_text("/a");
        print "$url\t$text \n";
    }

    seek $html, 0, SEEK_SET;    # rewind to start of file
    $p = new HTML::TokeParser($html);
    while (my $token = $p->get_tag("img")) {
        my $src = $token->[1]{src} || "-";
        my $alt = $token->[1]{alt} || "-";
        print "$src\t$alt\n";
    }

    close $html;


HTH,

Rob




-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: HTML::TokeParser

Reply via email to