To Konstantin and all the others who have responded,
many thanks for all the tips, specially since this was quite a bit off-topic. I need some time to digest the tips though, and choose the best way according to the code that was dumped in my lap.

I must say that I find it a bit curious that Java does not have an easy out-of-the-box method to convert a byte to a char, with a character filter specifier. Something like
char mychar = toChar(int,charset) (or int.toChar(charset))
Oh well, maybe Java 7..

To Konstantin in particular :
I know that I don't lose information by converting iso-8859-2 (thinking it is iso-8859-1) to Unicode one way, then re-converting this Unicode to iso-8859-2 (re-using the iso-8859-1 filter). I will get the same bytes in the end. The problem is that this is a servlet writing the result to the response object. And if I tell it to use iso-8859-1 for the response, it automatically also sets the response Content-Type to iso-8859-1.
Which in this case is wrong, because the browser then gets confused.
And as I have found out, it is quite hard to change this Content-Type header after-the-fact. Even a servlet filter won't do it, because by that time the response is committed. Even the front-end Apache can't do it, because it won't let you change the Content-Type header..

So my problem is in reverse :
The servlet must set the response output encoding to iso-8859-2, in order to produce the correct Content-Type for the browser. To produce correct iso-8859-2 from the internal Unicode string, this Unicode string must have the proper Unicode chars corresponding to the iso-8859-2 characters I want to output. But the servlet reads those bytes as int's, and does a bunch of internal tests and manipulations on them, without taking into account that they could be anything else than iso-8859-1.

For the same reason, I cannot just replace the InputStream by something that would translate these bytes on-the-fly to Unicode chars, because for high iso-8859-2 bytes, it would generate internal codes that do no longer fall into values 0-255, and that may create a problem somewhere deep in code I haven't yet looked at.

I think I have to go back to examine that code, and see how often this StringBuffer is being used/manipulated. If not too often, I might replace it by a byte buffer, and do the conversion all at once each time it is being written out.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to