On Wed, 13 Feb 2002, John wrote:
> I have a scalar variable containing HTML that needs to be converted
> to XML. It's not the best HTML so it has invalid characters (like
> smart quotes, 1/2 character, etc.). I need to determine if these
> characters exist in the data and throw an error if they do. What
> is the best way to do this? I can't use an XML parser because it's
> not really XML.
But you can use an HTML Parser, such as HTML::Parser. There are some
useful subclasses of this like HTML::LinkExtor and HTML::TokeParser.
> Also, if I have a block of text like this:
>
> <!-- begin article1 title -->title1<!-- end article1 -->
> <!-- begin article1 body -->body1<!-- end article1 body -->
> ...
> <!-- begin articleN title -->titleN<!-- end articleN title>
> <!-- begin articleN body -->bodyN<!-- end articleN body -->
>
> Where the ... means there could be some number of articles (less
> than 5), can anyone think of a relatively simple regex (I mean I
> don't want to have article1, article2, etc. hard-coded in the regex)
Don't use regex to pull apart HTML, it'll be trouble that it's worth.
-- Brett
http://www.chapelperilous.net/
------------------------------------------------------------------------
/* And you'll never guess what the dog had */
/* in its mouth... */
-- Larry Wall in stab.c from the perl source code
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]