On 17/10/2016 08:30, Mark Thomas wrote:
> On 16/10/2016 19:09, Mark Juszczec wrote:
>> Hello
>>
>> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache 2.4.6
>>
>> I'm using AJP 1.3 for communication between Apache and Tomcat
>>
>> Its all powered by Java 1.8
>>
>> I'm having a problem with international characters when I send them as the
>> request *URI* (which is used by GET requests and this is a GET request).
>>
>> Let's say I get the string AOËL
>>
>> mod_jk log  logs the bytes with the message
>>
>>  "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to ajp13
>> pos=4 len=1411 max=8192" (at
>> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be:
>>
>>   41 4f c3 8b 4c
>>
>> AFAIK this means the correct bytes are being sent to AJP.  Is that correct?
> 
> That is the correct UTF-8 byte encoding for the characters AOËL.

A small hint. I'd expect those to be % encoded.

Mark


> 
> 
>> Running remote debugging via Spring Tool Suite to hook up to my code shows
>> me I receive:
>>
>>     41 4f c3 c3 83 c2 c2 8b 4c
> 
> That is not valid UTF-8. If the UTF-8 bytes had been treated as
> ISO-8859-1 and then re-encoded as UTF-8 I'd expect to see:
> 
> 41 4f c3 83 c2 8b 4c
> 
>> I have verified the incorrect bytes appear as early in the call stack as
>> when CoyoteAdapter.process() is invoked
> 
> I think you need to go a little further up the stack to track this down.
> 
>> I have UTF-8 specified as URIEncoding in ajp <Connector> and it has had no
>> effect.
> 
> That is the change I would have expected was required.
> 
>> Ive also specified  useBodyEncodingForURI as true with no effect.
> 
> That won't help for a GET request.
> 
>> Conventional wisdom says the data is getting inadvertently as ISO-8859-1
>> somewhere along the line. Since the data is correct (per mod_jk.log)
>> heading into AJP and incorrect once CoyoteAdapter.java starts handling it
>> somehow, something is going wrong when the data is interpreted after being
>> read from the AJP port.
>>
>> Is that correct?
> 
> It looks to be something like that.
> 
>> I am at a loss as to how to correct this.  The only 2 things the docs say
>> are to use URIEnocding="UTF-8" and  useBodyEncodingForURI="true".  I'm
>> doing that and its not working.
>>
>> I am at a loss about what else to try or where to look.
>>
>> If you were faced with this, what would you try?  Any advice or suggestions
>> will be greatly appreciated.
> 
> I'd dig into the connector code. You need to figure out where those
> bytes are being transformed and why.
> 
> Mark
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to