On 11/16/2012 05:12 PM, Nick Kew wrote:
On Fri, 16 Nov 2012 11:31:38 +0100
Thomas Eckert<thomas.eck...@sophos.com> wrote:
Thanks for the hint but unfortunately "manually" adding xml2enc to the
filtering chain does not help.
Looks like you've got problems over and above anything to do with
your configuration!
"SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
I thought you said it had charset issues?
[pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
trying apr_xlate
That seems implausible. How do you get a libxml2 install that
doesn't natively support ISO-8859-1 (latin1)?
[pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
[client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
[pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
invalid byte(s) in input stream!
(and more conversion errors)
It looks as if your backend incorrectly identifies the charset
of the page in question. Either that or you found a bug.
Do you have a URL where your unprocessed page could be viewed?
Sorry for the delay on this. The basic problem remains: If I enable html
rewriting and connect with a client requesting content compression the
reverse proxy will fail with a message pointing at libxml2/encoding. I
can also see different log entries depending on whether I set the
charset of the page.
So if I just send the page with "Content-Type: text/html" this is what I get
mod_deflate.c(1283): [client 10.10.10.10:39771] AH01398: Zlib: Inflated
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:39771] AH01430: Content-Type is
text/html
mod_xml2enc.c(259): [client 10.10.10.10:39771] AH01434: Charset
ISO-8859-1 not supported by libxml2; trying apr_xlate
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc:
consuming 682 bytes from bucket
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc:
converted 682/682 bytes
mod_deflate.c(763): [client 10.10.10.10:39771] AH01384: Zlib: Compressed
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc:
consuming 10 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc:
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc:
consuming 344 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771]
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(481): [client 10.10.10.10:39771] AH01440: xml2enc:
reinserting 334 unconsumed bytes from bucket
[client 10.10.10.10:39771] AH01385: Zlib error -2 flushing zlib output
buffer ((null))
But if "Content-Type: text/html; charset=ISO-8859-1" is sent this is
what I get
mod_deflate.c(1283): [client 10.10.10.10:40040] AH01398: Zlib: Inflated
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:40040] AH01430: Content-Type is
text/html;charset=ISO-8859-1
[client 10.10.10.10:40040] AH01431: Got charset ISO-8859-1 from HTTP headers
mod_deflate.c(763): [client 10.10.10.10:40040] AH01384: Zlib: Compressed
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc:
consuming 10 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): [client 10.10.10.10:40040] AH01441: xml2enc:
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc:
consuming 344 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040]
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input
stream!
mod_xml2enc.c(481): [client 10.10.10.10:40040] AH01440: xml2enc:
reinserting 334 unconsumed bytes from bucket
From what I can tell, this still seems to be the "wrong" processing as
the page cannot be inflated correctly at the user's end. Nevertheless
the message
AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
does not show up anymore. Looking at mod_xml2enc.c +185-194 and +251-268
that makes sense but would imply the enc detection in +198-206 failed. I
suggest adding some sort of "failed" debug message in case
xmlDetectCharEncoding() didn't work as desired.
I've tried a couple more combinations, including using mod_charset_lite
and different non-latin1 encodings on the backend, but the only thing
that works is using the Header directive on the backend to set
"Content-Type: text/html; charset=UTF-8" while leaving the actual
contents unchanged. Here, "works" means the page is displayed correctly
at the client's end.
The goal is still to get mod_proxy_html to rewrite the html just like it
would to with "ProxyHTMLEnable On" but at the same time retaining
compression support. So setting
SetOutputFilter INFLATE;proxy-html
which "drops out" the "xml2enc" filter might be problematic.
Unfortunately, the page is not accessible publicly. It is rather simply,
though, and I made sure there is nothing 'special' on that page - e.g.
it's just plain ascii, no meta tags, etc.
Note, I tried both "ProxyHTMLEnable On" and "SetOutputFilter
INFLATE;proxy-html" as filter directives for all above mentioned setups.
Neither worked except with the mentioned forced UTF-8 header.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org