On 17/10/2016 08:30, Mark Thomas wrote: > On 16/10/2016 19:09, Mark Juszczec wrote: >> Hello >> >> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6 >> >> I'm using AJP 1.3 for communication between Apache and Tomcat >> >> Its all powered by Java 1.8 >> >> I'm having a problem with international characters when I send them as the >> request *URI* (which is used by GET requests and this is a GET request). >> >> Let's say I get the string AOËL >> >> mod_jk log logs the bytes with the message >> >> "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13 >> pos=4 len=1411 max=8192" (at >> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be: >> >> 41 4f c3 8b 4c >> >> AFAIK this means the correct bytes are being sent to AJP. Is that correct? > > That is the correct UTF-8 byte encoding for the characters AOËL.
A small hint. I'd expect those to be % encoded. Mark > > >> Running remote debugging via Spring Tool Suite to hook up to my code shows >> me I receive: >> >> 41 4f c3 c3 83 c2 c2 8b 4c > > That is not valid UTF-8. If the UTF-8 bytes had been treated as > ISO-8859-1 and then re-encoded as UTF-8 I'd expect to see: > > 41 4f c3 83 c2 8b 4c > >> I have verified the incorrect bytes appear as early in the call stack as >> when CoyoteAdapter.process() is invoked > > I think you need to go a little further up the stack to track this down. > >> I have UTF-8 specified as URIEncoding in ajp <Connector> and it has had no >> effect. > > That is the change I would have expected was required. > >> Ive also specified useBodyEncodingForURI as true with no effect. > > That won't help for a GET request. > >> Conventional wisdom says the data is getting inadvertently as ISO-8859-1 >> somewhere along the line. Since the data is correct (per mod_jk.log) >> heading into AJP and incorrect once CoyoteAdapter.java starts handling it >> somehow, something is going wrong when the data is interpreted after being >> read from the AJP port. >> >> Is that correct? > > It looks to be something like that. > >> I am at a loss as to how to correct this. The only 2 things the docs say >> are to use URIEnocding="UTF-8" and useBodyEncodingForURI="true". I'm >> doing that and its not working. >> >> I am at a loss about what else to try or where to look. >> >> If you were faced with this, what would you try? Any advice or suggestions >> will be greatly appreciated. > > I'd dig into the connector code. You need to figure out where those > bytes are being transformed and why. > > Mark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org