> Dan Muey wrote: > > > > >> > >> I am trying to use HTML::TokeParser > >> From the cpan page for this I used this example : > >> > >> while (my $token = $p->get_tag("a")) { > >> my $url = $token->[1]{href} || "-"; > >> my $text = $p->get_trimmed_text("/a"); > >> print "$url\t$text \n"; > >> } > >> > >> Worked great :: > >> So I tried to do something similar with the img tag :: > >> > >> while (my $token = $p->get_tag("img")) { > >> my $src = $token->[1]{src} || "-"; > >> my $alt = $token->[1]{alt} || "-"; > >> print "$src\t$alt\n"; > >> } > >> > >> and I get nothing, even thought I know there are lots of image tags > > > > I tried commenting out the 'a'; version and the 'img' > version worked! > > So both chunks of code work they just don't work if you run > then back > > to back. I tried undef $token; I tried using different > names for the > > tokens ( $token and $tokenq respectively. I tried removing > 'my' from > > before $token and basically it seems that you can only get results > > from get_tag once. > > > > Is there any way to reset this so that I can do both chunks of code > > above, one after the other, IE call $p->get_tag() more than once? > > > > what happen is that when HTML::TokeParser parse the HTML > file, it reads the > file and return all the tokens or undef when the EOF is > reached. once you > reached EOF, any attempt to read the token again will fail > and that's why > your next while(...) loop will never return anything because > your first > while(...) loop eats up everything in the file already and > the file pointer > is pointed to EOF. the module does not have any way to reset to the > beginning of the file so you must catch all tokens (or tags in your > example) along the first parse: > > #!/usr/bin/perl -w > use strict; > > use HTML::TokeParser; > > my $tok = new HTML::TokeParser("index.html") || die $!; while(1){ > my $token = $tok->get_tag("a","img"); > last unless($token); > if($token->[0] eq 'a'){ > print $token->[1]{href} || "what?","\n"; > }else{ > print $token->[1]{src} || "again?","\n"; > } > } > That's exactly it, thanks you very much! One more tiny little problem,
I have it grabbing the title, links and img tags perfectly except fot one minor snafu It won't grab/parse img tags that are between <a> tags, IE an image that is a link. I tried having it parse <img>'s first then <a>'s but that didn't work. Any thoughts?? Thanks for all you r help! Dan > __END__ > > david > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]