Hi all, The HTML push parser in recovery mode misses ending </script> tags when they happen to occur at chunk border. That is, the ending tag is only partly in the pushed chunk, eg. the last characters in the chunk are "</scri", and the rest of the tag will be in the next chunk. When this unfortunate case happens, the push parser is lost. It won't emit events for any tags, and it will (incorrectly) append additional ending tags </body> and </html> after the end of the document has been encountered.
I've filed a bug report on this, but it is uncommented, perhaps forgotten. The bug report includes C source code for reproducing the bug, see https://bugzilla.gnome.org/show_bug.cgi?id=706952 The bug affects mod_proxy_html for Apache 2 which relies on the recovery mode entirely. It is where I ran into the problem. Occasionally, the document which went through the filter was only half filtered, and most of the document seemed to be untouched, and there were also additional tags at the end. Because of the bug's random nature (whether the </script> occurs right at the border of a chunk), it is rarely encountered in practice, and it is even more difficult to catch it and nail it down to libxml2. Jani _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml