Re: mod_jk codepage in header values

André Warnier Thu, 21 Jan 2010 02:30:51 -0800

Mirko Solic wrote:

Christopher thanks for quick replay.

...


I'm from Slovenija, Europe. We are using character that are not defined

in ASCII so we are using UTF-8 cp.

I will try to explain what is this application about.

This project (web page) is protected with AAI
(http://www.switch.ch/aai/about/). This  Authentication and
Authorization infrastructure is roughly divided on SP (service provider)
and Idp (identity provider). SP is module in apache. So when user try to
get web page that is protected with AAI through apache, SP module checks
if user is alredy logged in. If not SP redirects user to Idp where user
can put his/her username and password. If everything is ok Idp sends

users data in xml to SP. SP puts this data into apacheenvironment variables so applications (web pages) can access it.

Here i use mod_jk to get this environment variables in tomcat in HTTP
header. If i print user data on apache side i get values in UTF-8
encoding but if i try this on tomcat i don't get right values i have to
make conversion.

Is it mod_jk responsible for converting UTF-8 environment variable to

ACSII header values or is this conversion made elsewhere?

Mirko,

I am from Belgium, Europe too. I live in Spain and work mostly forGerman and other international customers (among which are some fromPoland too). This to say that I am well-aware of multi-lingual characterset issues, and confront them every day.

So, just so as to give you some "context" for your issues :

Despite the fact that Unicode and UTF-8 are now being increasingly usedon the web, the fact is that HTTP, and HTML, and most of the otherWWW-relevant RFCs, are still US-ASCII and ISO-8859-1 (latin-1) based.

For example, HTTP header values are /supposed/ to contain onlysingle-byte character codes that are part of the (printable subset of)US-ASCII character set.For example also, by default, all "content" exchanged between browsersand web servers is iso-8859-1.

And it is so because the relevant RFCs say that it should be.

(So the developers of Apache and mod_jk and Tomcat have little choice inthe matter; they have to follow the RFCs).

This does not mean that you cannot handle other character sets on theweb. But it means that whenever you do, you have to be attentive to thefact that it is /not/ the standard, and that you may have to docharacter set translations and/or encoding.It may even mean that, in order to exchange non-US-ASCII ornon-ISO-8859-1 data, you may have to use "tricks".It also means that, in some cases, by using such "tricks", yourapplications may become "non-standard", and will not necessarily workwith all servers and all clients.

So for example, to get back to your question above : mod_jk is notresponsible for translating anything, and will not translate anything.That is because mod_jk follows the relevant WWW RFCs, which specify thatsuch and such data is ASCII or ISO-8859-1.

If the original HTTP request, as it is given by Apache to mod_jk,contains HTTP headers, mod_jk will forward these headers "as is" to theback-end Tomcat. But, because the HTTP RFC specifies that HTTP headersshould contain only US-ASCII character data, mod_jk would be allowed, ifit finds non-US-ASCII data in a HTTP header, to strip this data orignore the header or something like that. I don't know if mod_jkactually does this, but if it did, it would be justified, becauseaccording to the HTTP RFC this would be an invalid header.


So, to be practical :

- the current HTTP 1.1 RFC specifies that HTTP headers can only containUS-ASCII printable character data- some UTF-8 codes contain bytes that are not part of the US-ASCIIcharacter set (e.g. : bytes with values above 0x7F)- so, if you want to forward such a header from Apache to Tomcat, inprinciple the "right" way is to "encode" the value of this header on theApache side, in such a way that it contains only US-ASCII data (forexample, using Base64 encoding), then pass it to mod_jk.- at the other end, your application would have to decode this header(using Base64 decoding) back into UTF-8, and then it would have to readthis header value as UTF-8/Unicode.

There is no guarantee that any standard module or class under Apache ormod_jk or Tomcat would properly handle a header that containsnon-US-ASCII data. That because, in principle, they never have to.

I know it is a mess. It is possible that there are shortcuts. It ispossible that mod_jk would transmit a HTTP header, even if it containsnon-US-ASCII data. But it is not sure, because "the bible" for mod_jk,as for Apache and as for Tomcat, are the RFCs.

We non-English speakers worldwide desperately need a new version of theHTTP protocol where the default would be Unicode/UTF-8, for everything.

But I do not see much happening right now in that direction.


Maybe a tip for your authentication issues :
If, in the AJP <Connector> on the Tomcat side, you set the attribute
tomcatAuthentication="false"

then Tomcat will accept the user-id authenticated by Apache, as theuser-id for Tomcat (mod_jk transmits it).So if your user authentication mechanism works fine at the Apache level,and generates a user-id that is "acceptable" by Tomcat, this may be asolution for your issue.I have no idea if this user-id, for Tomcat, can or cannot containnon-US-ASCII characters.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: mod_jk codepage in header values

Reply via email to