Hi, On your Linux box type “locale” + enter. The results should be UTF 8. If not change it. You can also set it in the file encoding java environment variable as suggested above as extra safety measure.
Tomcat’s logic of determining the encoding from the request only applies when Tomcat is parsing text in the request. However if you read from the stream directly, using request.getInputStream() you are getting binary data. When you create java.io.Reader from that input stream you need to specify the encoding, or it will default to the file system encoding. In that case the reader is a java API that does not go through the tomcat APIs. This reader has no knowledge of the request data, any encoding specified in it or what tomcat will default to. The fact that tomcat is using ISO-8859-1 to read characters is not relevant if you are reading from the stream directly and use your own Reader to convert to characters. I am assuming this is a likely cause, since the XML parsing succeeds, so I assume the XML parser is getting raw bytes from tomcat rather than characters (using request.getInputStream() as opposed to request.getReader()). In that case the XML parser will resolve the encoding by itself. Ideally when you create your own reader for plain text you can use the character set in the request, however if you do not trust the clients just force UTF-8 for the OS and the tomcat process by specifying the OS locale. You can also force UTF8 encoding when you create the reader in your java code (it is a constructor parameter for the reader), but it is looks easier to just specify it in the OS/tomcat start up without changing the application code. E