At Wednesday, 13 February 2002, "Brett W. McCoy" <bmccoy@chapelperilous.
net> wrote:
>
>Don't use regex to pull apart HTML, it'll be trouble that it's worth.
Are you sure about this or am I still going about this wrong. I
haven't tried this yet, haven't even gotten to the articles. This
had been a really simple regex to extract the date:
if ( ! defined( my $p = HTML::TokeParser->new( $html )))
{
localError( "Unable to parse $html : $!" );
}
while ( my $token = $p->get_token())
{
if ( $token[0] = 'C' $token[1] =~ m#<!-- begin header date --># )
{
while ( my $token = $p->get_token())
{
if ( $token[0] eq "T" )
{
$date .= $token[1];
}
elsif ( $token[0] eq "S" )
{
$date .= $token[4];
}
elsif ( $token[0] eq "E" )
{
$date .= $token[2];
}
elsif ( $token[0] eq "C" && $token[1] =~ m#<!-- end header
date --># )
{
last;
}
else
{
localError( "$token[0] : unrecognized HTML Token
Type : <PRE>" . Dumper( $token ) . "</PRE>";
}
}
}
}
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]