What about this patch? Really hacky as the charset is checked before the
inital parse and it basically duplicates the libxml code with the
correct fix, but seems to work ok. Havent tried it with any large
datasets yet which require multiple parse calls, but it should work.
Rob
Joe Orton wrote:
That's not quite right: detection based on an ASCII <?xml prolog with an
explicit encoding= still works fine with the patch applied (e.g. for
encoding=ISO-8859-1 documents). It's *only* documents which have a BOM
which will then fail to parse.
So it is a bit of a tricky trade-off...
Index: compat.c
===================================================================
RCS file: /repository/php-src/ext/xml/compat.c,v
retrieving revision 1.40
diff -r1.40 compat.c
480a481,497
>
> /* The following function is a hack to keep BC while avoiding
> the inifite loop in libxml < 2.6.18 which occurs when no encoding
> has been defined and none can be detected */
> #if LIBXML_VERSION < 20618
> if (parser->parser->instate == XML_PARSER_START &&
> parser->parser->charset == XML_CHAR_ENCODING_NONE && data_len
> >= 4) {
> xmlChar start[4];
>
> start[0] = *data;
> start[1] = data[1];
> start[2] = data[2];
> start[3] = data[3];
> xmlSwitchEncoding(parser->parser,
> xmlDetectCharEncoding(&start[0], 4));
> }
> #endif
>
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php