Re: Char Encoding text streams on Tomcat 5.5 and Linux

Elli Albek Tue, 01 Dec 2009 23:40:35 -0800

Hi,

On your Linux box type “locale” + enter. The results should be UTF 8. If not
change it. You can also set it in the file encoding java environment
variable as suggested above as extra safety measure.


Tomcat’s logic of determining the encoding from the request only applies
when Tomcat is parsing text in the request.

However if you read from the stream directly, using request.getInputStream()
you are getting binary data. When you create java.io.Reader from that input
stream you need to specify the encoding, or it will default to the file
system encoding. In that case the reader is a java API that does not go
through the tomcat APIs. This reader has no knowledge of the request data,
any encoding specified in it or what tomcat will default to.

The fact that tomcat is using ISO-8859-1 to read characters is not relevant
if you are reading from the stream directly and use your own Reader to
convert to characters. I am assuming this is a likely cause, since the XML
parsing succeeds, so I assume the XML parser is getting raw bytes from
tomcat rather than characters (using request.getInputStream() as opposed to
request.getReader()).

In that case the XML parser will resolve the encoding by itself. Ideally
when you create your own reader for plain text you can use the character set
in the request, however if you do not trust the clients just force UTF-8 for
the OS and the tomcat process by specifying the OS locale. You can also
force UTF8 encoding when you create the reader in your java code (it is a
constructor parameter for the reader), but it is looks easier to just
specify it in the OS/tomcat start up without changing the application code.

E

Re: Char Encoding text streams on Tomcat 5.5 and Linux

Reply via email to