What about this patch? Really hacky as the charset is checked before the inital parse and it basically duplicates the libxml code with the correct fix, but seems to work ok. Havent tried it with any large datasets yet which require multiple parse calls, but it should work.

Rob

Joe Orton wrote:


That's not quite right: detection based on an ASCII <?xml prolog with an explicit encoding= still works fine with the patch applied (e.g. for encoding=ISO-8859-1 documents). It's *only* documents which have a BOM which will then fail to parse.

So it is a bit of a tricky trade-off...



Index: compat.c
===================================================================
RCS file: /repository/php-src/ext/xml/compat.c,v
retrieving revision 1.40
diff -r1.40 compat.c
480a481,497
> 
> /* The following function is a hack to keep BC while avoiding 
> the inifite loop in libxml < 2.6.18 which occurs when no encoding 
> has been defined and none can be detected */
> #if LIBXML_VERSION < 20618
>       if (parser->parser->instate == XML_PARSER_START && 
>               parser->parser->charset == XML_CHAR_ENCODING_NONE && data_len 
> >= 4) {
>               xmlChar start[4];
> 
>               start[0] = *data;
>               start[1] = data[1];
>               start[2] = data[2];
>               start[3] = data[3];
>               xmlSwitchEncoding(parser->parser, 
> xmlDetectCharEncoding(&start[0], 4));
>       }
> #endif
> 
-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to