Scan data for XML invalid characters and parse articles

John Wed, 13 Feb 2002 08:29:58 -0800

I have a scalar variable containing HTML that needs to be converted 
to XML.  It's not the best HTML so it has invalid characters (like 
smart quotes, 1/2 character, etc.).  I need to determine if these 
characters exist in the data and throw an error if they do.  What 
is the best way to do this?  I can't use an XML parser because it's 
not really XML.


Also, if I have a block of text like this:

<!-- begin article1 title -->title1<!-- end article1 -->
<!-- begin article1 body -->body1<!-- end article1 body -->
...
<!-- begin articleN title -->titleN<!-- end articleN title>
<!-- begin articleN body -->bodyN<!-- end articleN body -->

Where the ... means there could be some number of articles (less 
than 5), can anyone think of a relatively simple regex (I mean I 
don't want to have article1, article2, etc. hard-coded in the regex) 
that will extract the titles and bodies?

TIA,

   -John









-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Scan data for XML invalid characters and parse articles

Reply via email to