Re: Scan data for XML invalid characters and parse articles

John Wed, 13 Feb 2002 09:29:08 -0800

At Wednesday, 13 February 2002, "Brett W. McCoy" <bmccoy@chapelperilous.
net> wrote:
>
>Don't use regex to pull apart HTML, it'll be trouble that it's worth.


Are you sure about this or am I still going about this wrong.  I 
haven't tried this yet, haven't even gotten to the articles.  This 
had been a really simple regex to extract the date:

if ( ! defined( my $p = HTML::TokeParser->new( $html )))
{
    localError( "Unable to parse $html : $!" );
}

while ( my $token = $p->get_token())
{
    if ( $token[0] = 'C' $token[1] =~ m#<!-- begin header date --># )
    {
        while ( my $token = $p->get_token())
        {
            if ( $token[0] eq "T" ) 
            {
                $date .= $token[1];
            }
            elsif ( $token[0] eq "S" )
            {
                $date .= $token[4];
            }
            elsif ( $token[0] eq "E" )
            {
                $date .= $token[2];
            }
            elsif ( $token[0] eq "C" && $token[1] =~ m#<!-- end header 
date --># )
            {
                last;
            }
            else
            {
                localError( "$token[0] : unrecognized HTML Token 
Type : <PRE>" . Dumper( $token ) . "</PRE>";
            }
        }
    }
}








-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Scan data for XML invalid characters and parse articles

Reply via email to