Mirko Solic wrote:
Christopher thanks for quick replay.

...


I'm from Slovenija, Europe. We are using character that are not defined
in ASCII so we are using UTF-8 cp.
I will try to explain what is this application about.

This project (web page) is protected with AAI
(http://www.switch.ch/aai/about/). This  Authentication and
Authorization infrastructure is roughly divided on SP (service provider)
and Idp (identity provider). SP is module in apache. So when user try to
get web page that is protected with AAI through apache, SP module checks
if user is alredy logged in. If not SP redirects user to Idp where user
can put his/her username and password. If everything is ok Idp sends
users data in xml to SP. SP puts this data into apache environment variables so applications (web pages) can access it.
Here i use mod_jk to get this environment variables in tomcat in HTTP
header. If i print user data on apache side i get values in UTF-8
encoding but if i try this on tomcat i don't get right values i have to
make conversion.

Is it mod_jk responsible for converting UTF-8 environment variable to
ACSII header values or is this conversion made elsewhere?
Mirko,
I am from Belgium, Europe too. I live in Spain and work mostly for German and other international customers (among which are some from Poland too). This to say that I am well-aware of multi-lingual character set issues, and confront them every day.
So, just so as to give you some "context" for your issues :

Despite the fact that Unicode and UTF-8 are now being increasingly used on the web, the fact is that HTTP, and HTML, and most of the other WWW-relevant RFCs, are still US-ASCII and ISO-8859-1 (latin-1) based.

For example, HTTP header values are /supposed/ to contain only single-byte character codes that are part of the (printable subset of) US-ASCII character set. For example also, by default, all "content" exchanged between browsers and web servers is iso-8859-1.
And it is so because the relevant RFCs say that it should be.
(So the developers of Apache and mod_jk and Tomcat have little choice in the matter; they have to follow the RFCs).

This does not mean that you cannot handle other character sets on the web. But it means that whenever you do, you have to be attentive to the fact that it is /not/ the standard, and that you may have to do character set translations and/or encoding. It may even mean that, in order to exchange non-US-ASCII or non-ISO-8859-1 data, you may have to use "tricks". It also means that, in some cases, by using such "tricks", your applications may become "non-standard", and will not necessarily work with all servers and all clients.

So for example, to get back to your question above : mod_jk is not responsible for translating anything, and will not translate anything. That is because mod_jk follows the relevant WWW RFCs, which specify that such and such data is ASCII or ISO-8859-1.

If the original HTTP request, as it is given by Apache to mod_jk, contains HTTP headers, mod_jk will forward these headers "as is" to the back-end Tomcat. But, because the HTTP RFC specifies that HTTP headers should contain only US-ASCII character data, mod_jk would be allowed, if it finds non-US-ASCII data in a HTTP header, to strip this data or ignore the header or something like that. I don't know if mod_jk actually does this, but if it did, it would be justified, because according to the HTTP RFC this would be an invalid header.

So, to be practical :
- the current HTTP 1.1 RFC specifies that HTTP headers can only contain US-ASCII printable character data - some UTF-8 codes contain bytes that are not part of the US-ASCII character set (e.g. : bytes with values above 0x7F) - so, if you want to forward such a header from Apache to Tomcat, in principle the "right" way is to "encode" the value of this header on the Apache side, in such a way that it contains only US-ASCII data (for example, using Base64 encoding), then pass it to mod_jk. - at the other end, your application would have to decode this header (using Base64 decoding) back into UTF-8, and then it would have to read this header value as UTF-8/Unicode.

There is no guarantee that any standard module or class under Apache or mod_jk or Tomcat would properly handle a header that contains non-US-ASCII data. That because, in principle, they never have to.

I know it is a mess. It is possible that there are shortcuts. It is possible that mod_jk would transmit a HTTP header, even if it contains non-US-ASCII data. But it is not sure, because "the bible" for mod_jk, as for Apache and as for Tomcat, are the RFCs.

We non-English speakers worldwide desperately need a new version of the HTTP protocol where the default would be Unicode/UTF-8, for everything.
But I do not see much happening right now in that direction.


Maybe a tip for your authentication issues :
If, in the AJP <Connector> on the Tomcat side, you set the attribute
tomcatAuthentication="false"
then Tomcat will accept the user-id authenticated by Apache, as the user-id for Tomcat (mod_jk transmits it). So if your user authentication mechanism works fine at the Apache level, and generates a user-id that is "acceptable" by Tomcat, this may be a solution for your issue. I have no idea if this user-id, for Tomcat, can or cannot contain non-US-ASCII characters.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to