On Jun 15, Joshua Poling-Goldenne said:

>Hello.  I have just started learning perl and was wondering about some
>of the ways I could search/parse/change a string containing html
>content.  I would like to be able to search through the string and pick
>out certain tags to change.  For instance, seeking out all relative
>addresses and making them absolute, etc.

There are HTML parsers out there (such as HTML::Parser and my YAPE::HTML
module).  Using my module, you can do something like

  use YAPE::HTML;

  my ($parser, $HTML);

  # read in the HTML file from the commandline or STDIN
  { local $/; $parser = YAPE::HTML->new(<>); }

  # modify the HREF or SRC attributes
  # you can craft this however you'd like

  while (my $chunk = $parser->next) {
    if ($chunk->type eq 'tag') {
      if ($chunk->tag eq 'a') {
        my $URL = $chunk->get_attr('href');
        $URL = "http://www.specific.com/$URL"; if $URL !~ m!^http://!;
        $chunk->set_attr(href => $URL);
      }
      elsif ($chunk->tag eq 'img') {
        my $URL = $chunk->get_attr('src');
        $URL = "http://www.specific.com/$URL"; if $URL !~ m!^http://!;
        $chunk->set_attr(src => $URL);
      }
    }

    $HTML .= $chunk->fullstring;
  }

  print $HTML;

And that's that.  The API seems pretty easy to use (if you ask me).  But
you might prefer HTML::Parser.  It's up to you.

I strongly suggest you NOT try to roll your own HTML parser.  It's not a
simple feat.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/     http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc.     http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter.         Brother #734
**      Manning Publications, Co, is publishing my Perl Regex book      **

Reply via email to