Dan Muey wrote: > >> >> I am trying to use HTML::TokeParser >> From the cpan page for this I used this example : >> >> while (my $token = $p->get_tag("a")) { >> my $url = $token->[1]{href} || "-"; >> my $text = $p->get_trimmed_text("/a"); >> print "$url\t$text \n"; >> } >> >> Worked great :: >> So I tried to do something similar with the img tag :: >> >> while (my $token = $p->get_tag("img")) { >> my $src = $token->[1]{src} || "-"; >> my $alt = $token->[1]{alt} || "-"; >> print "$src\t$alt\n"; >> } >> >> and I get nothing, even thought I know there are lots of image tags > > I tried commenting out the 'a'; version and the 'img' version worked! > So both chunks of code work they just don't work if you run then back to > back. I tried undef $token; > I tried using different names for the tokens ( $token and $tokenq > respectively. I tried removing 'my' from before $token and basically it > seems that you can only get results from get_tag once. > > Is there any way to reset this so that I can do both chunks of code above, > one after the other, IE call $p->get_tag() more than once? >
what happen is that when HTML::TokeParser parse the HTML file, it reads the file and return all the tokens or undef when the EOF is reached. once you reached EOF, any attempt to read the token again will fail and that's why your next while(...) loop will never return anything because your first while(...) loop eats up everything in the file already and the file pointer is pointed to EOF. the module does not have any way to reset to the beginning of the file so you must catch all tokens (or tags in your example) along the first parse: #!/usr/bin/perl -w use strict; use HTML::TokeParser; my $tok = new HTML::TokeParser("index.html") || die $!; while(1){ my $token = $tok->get_tag("a","img"); last unless($token); if($token->[0] eq 'a'){ print $token->[1]{href} || "what?","\n"; }else{ print $token->[1]{src} || "again?","\n"; } } __END__ david -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]