On 28/05/2021 08:14, Carsten Klein wrote:
Hi there,
I'm facing character set encoding problems in quite a recent Tomcat 10
setup. I noticed that with the http://localhost:8080/manager/html
application in a browser (my browser) set to German language.
My Tomcat runs from within Eclipse, built with the official build.xml
file.
I suspect that that is not actually the case and that Eclipse is running
from its own copy of the source and compiled classes.
I'm using my forked cklein05/tomcat GitHub repository, which is
nearly up to date with your main branch.
In the Manager application, there are texts which contain German
umlauts, like "Lösche Sitzungen" (Expire sessions, aka
htmlManagerServlet.appsExpire).
These buttons now have captions that look like "Lösche Sitzungen".
Obviously that's an UTF-8 <-> ISO-xxxx-y conversion issue.
I'm pretty sure that my setup is not causing that problems.
Yes, it is.
After
digging into GitHub, I found that recently someone converted many (or
all) messages files to UTF-8:
https://github.com/apache/tomcat/commit/90fe08bdee0494110bb8145d2f067b61f74ae429
However, since these language files are actually java.util.Properties
files,
Not quite. They are java.util.ResourceBundle files.
these must be encoded as ISO-8859-1:
https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.InputStream-
That's also true for more recent versions of Java.
Not for ResourceBundle. As of Java 9, an encoding can be specified. As
soon as the minimum required version of Java is >=9, we'll switch to
that method of loading.
The language files are actually Properties files in a (according do
Javadoc) "simple line-oriented format". These must be loaded with the
Properties.load method(s) and must always be in ISO-8859-1. In contrast,
there are XML-based Properties files, that must be loaded with method(s)
loadFromXML(...). Only these must be encoded in UTF-8.
Although editing international language files in ISO-8859-1 requires
many \uXXXX escapes and is a hassle, for my mind, converting these
plain-text language files to UFT-8 was likely not a good idea.
The Tomcat maintainers disagree. Using UTF-8 makes maintenance
significantly simpler and allowed integration with poeditor.com that has
enabled 175 contributors (at today's count) to contribute new and
improved translations including complete translations in Chinese and Korean.
One thing you do need to be aware of is the use of MessageFormat. Any
string that contains {n} will be passed through MessageFormat so any
single quote characters in the string need to be escaped with a second
single quote. Apart from a few special cases, any instance of {n} is
surrounded by [] to give [{n}] so that replaced values are clearly
delimited. This is to help with issues around empty values and
leading/trailing spaces that are otherwise not immediately obvious in
the logs.
But why don't others report that problem?
A few people have. It has always been when running from the source
within an IDE.
Am I overlooking something?
https://github.com/apache/tomcat/blob/main/build.xml#L998
According to my explanation above, that problem is neither limited to
German language nor to the Manager application. It should occur with any
language using non-ascii characters (> 127) and with all localized text
resources Tomcat is using.
The issue is going to be some variation of Eclipse loading the
ResourceBundle instances from the original source files rather than from
the transformed versions created by the build process.
Not strictly relevant here but while Eclipse is my IDE of choice, I have
always built Tomcat from the command line and used remote debugging if I
need to step through the code. My (admittedly quite dated) experience
with the various plug-ins that can be used run Tomcat inside Eclipse has
never been good. The problems were usually around picking up updates to
code and/or figuring out where configuration files were being read from.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org