On Mon, Oct 17, 2016 at 4:29 AM, Mark Thomas <ma...@apache.org> wrote:
> On 17/10/2016 08:30, Mark Thomas wrote: > > On 16/10/2016 19:09, Mark Juszczec wrote: > >> Hello > >> > >> I have Tomcat 8.0.28 running on CentOS Linux 7.2.1511 behind Apache > 2.4.6 > >> > >> I'm using AJP 1.3 for communication between Apache and Tomcat > >> > >> Its all powered by Java 1.8 > >> > >> I'm having a problem with international characters when I send them as > the > >> request *URI* (which is used by GET requests and this is a GET request). > >> > >> Let's say I get the string AOËL > >> > >> mod_jk log logs the bytes with the message > >> > >> "ajp_connection_tcp_send_message::jk_ajp_common.c (1208): sending to > ajp13 > >> pos=4 len=1411 max=8192" (at > >> ajp_connection_tcp_send_message::jk_ajp_common.c) shows them to be: > >> > >> 41 4f c3 8b 4c > >> > >> AFAIK this means the correct bytes are being sent to AJP. Is that > correct? > > > > That is the correct UTF-8 byte encoding for the characters AOËL. > > A small hint. I'd expect those to be % encoded. > Thank you very much for your reply. I've been thinking the problem is lack of % encoding after reading: *"Default encoding for GET* The character set for HTTP query strings (that's the technical term for 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax" specification. The character set is defined to be US-ASCII <http://en.wikipedia.org/wiki/ASCII>. Any character that does not map to US-ASCII must be encoded in some way. Section 2.1 of the URI Syntax specification says that characters outside of US-ASCII must be encoded using % escape sequences: each character is encoded as a literal % followed by the two hexadecimal codes which indicate its character code. Thus, a (US-ASCII character code 97 = 0x61) is equivalent to %61. There *is no default encoding for URIs* specified anywhere, which is why there is a lot of confusion when it comes to decoding these values. " from http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q8 Do you know if there's a way to force something (mod_jk, mod_rewrite or something else) to % encode the data being fed into the AJP port? Mark