To Konstantin and all the others who have responded,
many thanks for all the tips, specially since this was quite a bit
off-topic.
I need some time to digest the tips though, and choose the best way
according to the code that was dumped in my lap.
I must say that I find it a bit curious that Java does not have an easy
out-of-the-box method to convert a byte to a char, with a character
filter specifier. Something like
char mychar = toChar(int,charset) (or int.toChar(charset))
Oh well, maybe Java 7..
To Konstantin in particular :
I know that I don't lose information by converting iso-8859-2 (thinking
it is iso-8859-1) to Unicode one way, then re-converting this Unicode to
iso-8859-2 (re-using the iso-8859-1 filter). I will get the same bytes
in the end.
The problem is that this is a servlet writing the result to the response
object. And if I tell it to use iso-8859-1 for the response, it
automatically also sets the response Content-Type to iso-8859-1.
Which in this case is wrong, because the browser then gets confused.
And as I have found out, it is quite hard to change this Content-Type
header after-the-fact.
Even a servlet filter won't do it, because by that time the response is
committed.
Even the front-end Apache can't do it, because it won't let you change
the Content-Type header..
So my problem is in reverse :
The servlet must set the response output encoding to iso-8859-2, in
order to produce the correct Content-Type for the browser. To produce
correct iso-8859-2 from the internal Unicode string, this Unicode string
must have the proper Unicode chars corresponding to the iso-8859-2
characters I want to output.
But the servlet reads those bytes as int's, and does a bunch of internal
tests and manipulations on them, without taking into account that they
could be anything else than iso-8859-1.
For the same reason, I cannot just replace the InputStream by something
that would translate these bytes on-the-fly to Unicode chars, because
for high iso-8859-2 bytes, it would generate internal codes that do no
longer fall into values 0-255, and that may create a problem somewhere
deep in code I haven't yet looked at.
I think I have to go back to examine that code, and see how often this
StringBuffer is being used/manipulated. If not too often, I might
replace it by a byte buffer, and do the conversion all at once each time
it is being written out.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org