I have a scalar variable containing HTML that needs to be converted to XML. It's not the best HTML so it has invalid characters (like smart quotes, 1/2 character, etc.). I need to determine if these characters exist in the data and throw an error if they do. What is the best way to do this? I can't use an XML parser because it's not really XML.
Also, if I have a block of text like this: <!-- begin article1 title -->title1<!-- end article1 --> <!-- begin article1 body -->body1<!-- end article1 body --> ... <!-- begin articleN title -->titleN<!-- end articleN title> <!-- begin articleN body -->bodyN<!-- end articleN body --> Where the ... means there could be some number of articles (less than 5), can anyone think of a relatively simple regex (I mean I don't want to have article1, article2, etc. hard-coded in the regex) that will extract the titles and bodies? TIA, -John -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]