>I have a scalar variable containing HTML that needs to be converted >to XML. It's not the best HTML so it has invalid characters (like >smart quotes, 1/2 character, etc.). I need to determine if these >characters exist in the data and throw an error if they do. What >is the best way to do this? I can't use an XML parser because it's >not really XML.
Welp, ultimately, if you were using an XML parser, it would choke on the bad data. For instance, this code: use XML::Simple; my $data = eval { XMLin( $xml_data ); }; if ($@) { print $@; } Would produce error messages like: There was an error loading morelikethisweblog.xml: not well-formed at line 72, column 28, byte 4001 at C:/Perl/site/lib/XML/Parser.pm line 168. There was an error loading hackintheboxorg.xml: mismatched tag at line 168, column 2, byte 4751 at C:/Perl/site/lib/XML/Parser.pm line 168. A cheat would be to: my $invalid_data_check = "<data>$real_data</data>"; And then XMLin on $invalid_data_check, as above. Another option is to HTML encode all the data before passing it off to the XML creator/parsing code: use HTML::Entities qw( %char2entity ); $real_data =~ s/([^\s!\#\$&%\'-;=?-~<>"])/$char2entity{$1}/g; (note, in this example, I'm importing the char2entity hash myself, which allows me to define exactly what characters I DO NOT want turned into entities (the first part of the regexp). Check the man page for the defaults. With the above in hand, my XML parsing usually runs like this: use XML::Simple; my $data = eval { XMLin( $xml_data ); }; if ($@) { print "$@, attempting to repair."; $xml_data =~ s/([^\s!\#\$&%\'-;=?-~<>"])/$char2entity{$1}/g; eval { XMLin( $xml_data ); } if ($@) { print "Nope. Still an error."; } } You can probably modify that to your use. -- Morbus Iff ( softcore vulcan pr0n rulezzzzz ) http://www.disobey.com/ && http://www.gamegrene.com/ please me: http://www.amazon.com/exec/obidos/wishlist/25USVJDH68554 icq: 2927491 / aim: akaMorbus / yahoo: morbus_iff / jabber.org: morbus -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]