On 28/05/2021 08:14, Carsten Klein wrote:
Hi there,

I'm facing character set encoding problems in quite a recent Tomcat 10 setup. I noticed that with the http://localhost:8080/manager/html application in a browser (my browser) set to German language.

My Tomcat runs from within Eclipse, built with the official build.xml file.

I suspect that that is not actually the case and that Eclipse is running from its own copy of the source and compiled classes.

I'm using my forked cklein05/tomcat GitHub repository, which is nearly up to date with your main branch.

In the Manager application, there are texts which contain German umlauts, like "Lösche Sitzungen" (Expire sessions, aka htmlManagerServlet.appsExpire).

These buttons now have captions that look like "Lösche Sitzungen". Obviously that's an UTF-8 <-> ISO-xxxx-y conversion issue.

I'm pretty sure that my setup is not causing that problems.

Yes, it is.

After digging into GitHub, I found that recently someone converted many (or all) messages files to UTF-8:

https://github.com/apache/tomcat/commit/90fe08bdee0494110bb8145d2f067b61f74ae429

However, since these language files are actually java.util.Properties files,

Not quite. They are java.util.ResourceBundle files.

these must be encoded as ISO-8859-1:

https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.InputStream-
That's also true for more recent versions of Java.

Not for ResourceBundle. As of Java 9, an encoding can be specified. As soon as the minimum required version of Java is >=9, we'll switch to that method of loading.

The language files are actually Properties files in a (according do Javadoc) "simple line-oriented format". These must be loaded with the Properties.load method(s) and must always be in ISO-8859-1. In contrast, there are XML-based Properties files, that must be loaded with method(s) loadFromXML(...). Only these must be encoded in UTF-8.

Although editing international language files in ISO-8859-1 requires many \uXXXX escapes and is a hassle, for my mind, converting these plain-text language files to UFT-8 was likely not a good idea.

The Tomcat maintainers disagree. Using UTF-8 makes maintenance significantly simpler and allowed integration with poeditor.com that has enabled 175 contributors (at today's count) to contribute new and improved translations including complete translations in Chinese and Korean.

One thing you do need to be aware of is the use of MessageFormat. Any string that contains {n} will be passed through MessageFormat so any single quote characters in the string need to be escaped with a second single quote. Apart from a few special cases, any instance of {n} is surrounded by [] to give [{n}] so that replaced values are clearly delimited. This is to help with issues around empty values and leading/trailing spaces that are otherwise not immediately obvious in the logs.

But why don't others report that problem?

A few people have. It has always been when running from the source within an IDE.

Am I overlooking something?

https://github.com/apache/tomcat/blob/main/build.xml#L998

According to my explanation above, that problem is neither limited to German language nor to the Manager application. It should occur with any language using non-ascii characters (> 127) and with all localized text resources Tomcat is using.

The issue is going to be some variation of Eclipse loading the ResourceBundle instances from the original source files rather than from the transformed versions created by the build process.

Not strictly relevant here but while Eclipse is my IDE of choice, I have always built Tomcat from the command line and used remote debugging if I need to step through the code. My (admittedly quite dated) experience with the various plug-ins that can be used run Tomcat inside Eclipse has never been good. The problems were usually around picking up updates to code and/or figuring out where configuration files were being read from.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to