--- Adriano Allora <[EMAIL PROTECTED]> wrote:

> I didn't understand how to use the module HTML, but I need to count
> how 
> many tags of several types appear in a web page and so I wrote this 
> script.
> 
> Someone can tell me why this one doesn't work? 
> 
> %tags = ("paragraph" => "p",
>       "list_o" => "ol",
>       "list_no" => "ul",
>       "title" => "h1",
>       "ltl_title" => "h2|3|4|5",
>       "link" => "href");
> 

First, I would suggest that you're trying to count two different
things,  tags and attributes.  You may wish to separate them.  The
following code  will do what you want.  It uses the
HTML::TokeParser::Simple module to make this relatively easy to read. 
Whether or not the data structures are the best way to handle this is
another story.

  #!/usr/bin/perl

  use strict;
  use warnings;
  use HTML::TokeParser::Simple 3.13;

  my $parser = HTML::TokeParser::Simple->new( handle => \*DATA );

  my %tag_for = (
      "paragraph" => { name => "p",         count => 0 },
      "list_o"    => { name => "ol",        count => 0 },
      "list_no"   => { name => "ul",        count => 0 },
      "title"     => { name => "h1",        count => 0 },
      "ltl_title" => { name => qr/h[2345]/, count => 0 },
  );

  my %attribute_for = ( "link" => { name => "href", count => 0 } );

  while ( my $token = $parser->get_tag ) {
      foreach my $tag ( keys %tag_for ) {
          if ( $token->is_start_tag( $tag_for{$tag}{name} ) ) {
              $tag_for{$tag}{count}++;
              last;
          }
      }
      foreach my $attribute ( keys %attribute_for ) {
          if ( $token->get_attr( $attribute_for{$attribute}{name} ) ) {
              $attribute_for{$attribute}{count}++;
              last;
          }
      }
  } 
    
  foreach my $type ( keys %tag_for ) {
      printf "%10s  %3d\n", $type, $tag_for{$type}{count};
  }
  print "\n";
  foreach my $type ( keys %attribute_for ) {
      printf "%10s  %3d\n", $type, $attribute_for{$type}{count};
  }
  __DATA__ 
  <head></head>
  <body>    
    <h1>title</h1>
    <p>One P tag</p>
    <ul>
      <li>item</li>
    </ul>   
    <h2>Little title 1</h2>
    <h2>Little title 2</h2>
    <h3>Little title 3</h3>
    <a href="foo.html">asdf</a>
  </body>

And the output:

      list_o    0
     list_no    1
       title    1
   ltl_title    3
   paragraph    1

        link    1

Cheers,
Ovid

-- 
If this message is a response to a question on a mailing list, please send
follow up questions to the list.

Web Programming with Perl -- http://users.easystreet.com/ovid/cgi_course/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to