Funny, here's the script since I modified it perhaps I jacked somehting up

Also I had it checking meta tags ::

my $name = $token->[1]{name} || "-";
my $http = $token->[1]{http-equiv} || "-";
my $cont = $token->[1]{content} || "-";

That would grab content for all of them,
The name if it was a name and if it was an http-equiv it would make come out as -

Not sure what I'm doing wrong but any way here's my script ::
Thanks

#!/usr/bin/perl

use LWP::Simple;
use HTML::TokeParser;

$url = $ARGV[0];

$content = get($url);

                $p = HTML::TokeParser->new(\$content);
my %text_links_con;
my %img_con;
my %meta_con;

while(1) {
        my $token = $p->get_tag("a","img","title","meta","rel");
        last unless($token);

        if($token->[0] eq 'img') {
                $x++; # I'm doing this so I can get a count of how many it's finding
                my $src = $token->[1]{src} || "-";
                my $alt = $token->[1]{alt} || "-";
                $img_con{$x} = "SRC : $src, ALT :$alt";;
        }
        elsif($token->[0] eq 'a') {
                $i++;
                my $url = $token->[1]{href} || "-";
                my $text = $p->get_trimmed_text("/a");
                $text_links_con{$i} = "Text : $text, URL : $url";
        }
        elsif($token->[0] eq 'title') {
                $title_con = $p->get_trimmed_text;
        } 

}

print "TITLE : $title_con \n";
foreach $q(keys %text_links_con) { print "LINK : $q - $text_links_con{$q} \n"; }
foreach $x(keys %img_con) { print "IMG  : $x - $img_con{$x} \n"; }

> 
> what do you mean? the following seems to be working:
> 
> #!/usr/bin/perl -w
> use strict;
>  
> use HTML::TokeParser;
> 
> my $tok = new HTML::TokeParser(*DATA) || die $!;
> while(1){
>         my $token = $tok->get_tag("a","img");
>         last unless($token);
>         if($token->[0] eq 'a'){
>                 print $token->[1]{href} || "what?","\n";
>         }else{
>                 print $token->[1]{src} || "again?","\n";
>         }
> }
> 
> __DATA__
> <html>
> <body>
> <a href=link1>link1</a>
> <img src=img1>img1</img>
> <a href=link2><img src=img_inside_a></img></a>
> </body>
> </html>
> 
> prints:
> 
> link1
> img1
> link2
> img_inside_a
> 
> so img_inside_a does show up. am i missing something?
> 
> david
> 
> -- 
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to