On Jun 15, Joshua Poling-Goldenne said:
>Hello. I have just started learning perl and was wondering about some
>of the ways I could search/parse/change a string containing html
>content. I would like to be able to search through the string and pick
>out certain tags to change. For instance, seeking out all relative
>addresses and making them absolute, etc.
There are HTML parsers out there (such as HTML::Parser and my YAPE::HTML
module). Using my module, you can do something like
use YAPE::HTML;
my ($parser, $HTML);
# read in the HTML file from the commandline or STDIN
{ local $/; $parser = YAPE::HTML->new(<>); }
# modify the HREF or SRC attributes
# you can craft this however you'd like
while (my $chunk = $parser->next) {
if ($chunk->type eq 'tag') {
if ($chunk->tag eq 'a') {
my $URL = $chunk->get_attr('href');
$URL = "http://www.specific.com/$URL" if $URL !~ m!^http://!;
$chunk->set_attr(href => $URL);
}
elsif ($chunk->tag eq 'img') {
my $URL = $chunk->get_attr('src');
$URL = "http://www.specific.com/$URL" if $URL !~ m!^http://!;
$chunk->set_attr(src => $URL);
}
}
$HTML .= $chunk->fullstring;
}
print $HTML;
And that's that. The API seems pretty easy to use (if you ask me). But
you might prefer HTML::Parser. It's up to you.
I strongly suggest you NOT try to roll your own HTML parser. It's not a
simple feat.
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
** Manning Publications, Co, is publishing my Perl Regex book **