compat.c fix for #32001

Rob Richards Fri, 25 Feb 2005 10:21:28 -0800

What about this patch? Really hacky as the charset is checked before the inital parse and it basically duplicates the libxml code with the correct fix, but seems to work ok. Havent tried it with any large datasets yet which require multiple parse calls, but it should work.

Rob

Joe Orton wrote:


That's not quite right: detection based on an ASCII <?xml prolog with an
explicit encoding= still works fine with the patch applied (e.g. for
encoding=ISO-8859-1 documents).  It's *only* documents which have a BOM
which will then fail to parse.

So it is a bit of a tricky trade-off...

Index: compat.c
===================================================================
RCS file: /repository/php-src/ext/xml/compat.c,v
retrieving revision 1.40
diff -r1.40 compat.c
480a481,497
> 
> /* The following function is a hack to keep BC while avoiding 
> the inifite loop in libxml < 2.6.18 which occurs when no encoding 
> has been defined and none can be detected */
> #if LIBXML_VERSION < 20618
>       if (parser->parser->instate == XML_PARSER_START && 
>               parser->parser->charset == XML_CHAR_ENCODING_NONE && data_len 
> >= 4) {
>               xmlChar start[4];
> 
>               start[0] = *data;
>               start[1] = data[1];
>               start[2] = data[2];
>               start[3] = data[3];
>               xmlSwitchEncoding(parser->parser, 
> xmlDetectCharEncoding(&start[0], 4));
>       }
> #endif
>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] [PATCH] ext/xml/compat.c fix for #32001

Reply via email to