Hello, On Mon, Dec 19, 2022 at 2:48 PM Bill Allombert <ballo...@debian.org> wrote: > Which raise the question: does the corresponding user group moved to UTF-8 ? > Judging from <https://en.wikipedia.org/wiki/Chinese_character_encoding>, > neither Chinese nor Japanese users have overwhelmingly moved to UTF-8, > so it would be problematic to stop supporting BIG5, GB18030 and EUC-JP.
Bill, thank you, thank you, thank you! You speak the voice of reason! Adam, we living in the West may think of BIG5, GB18030 and EUC-JP as legacy/obsolete encodings, but in Mainland China, GB18030 is anything but legacy. It is a mandatory national standard that has recently been brought up to date in GB 18030-2022, synchronizing with ISO/IEC 10646:2017 (equivalent to Unicode version 11.0). "GB 18030 is a national standard with stringent conformance requirements that regulate eligibility for products or services to be sold in China." I personally went through this trying to get the now defunct ThizLinux distro GB 18030-2000 conformant 20 years ago. GB 18030-2022 will become mandatory on 2023-08-01. Why the urgency? To add support 17000+ rarer CJK Han characters found in people's and place names, as well as improving support for minor ethnic languages in China. And the GB18030 standard committee is really serious about promoting GB18030 because they are eager to resolve some real issues of "missing characters" that are negatively affecting the people living in China. To my pleasant surprise, they are putting out a public lecture webinar series that explains the why and the how of implementing GB 18030-2022, with the 3rd video published on 2022-12-30. In their mind, GB 18030 encompasses a lot more than just a character encoding mapping table. It is the full support package (including fonts, display, printing, input methods, etc.) for Han Chinese and all other minority languages used in China. See e.g. the following excellent articles for more information: * https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132 * https://www.unicode.org/L2/L2022/22274-disruptive-changes.pdf Even though Debian is not proprietary/commercial software, the GB 18030 authority highly recommends that free/libre and open-source software _do_ implement GB 18030-2022. That's especially true considering the fact that vendors in China may be offering Debian as a solution for clients, but they would be prevented from doing so if Debian Policy spells out "We support UTF-8 and UTF-8 only. Think of all the ARM and RISC-V single-board computers made in China where Debian is the default OS image; Debian or derivatives (Ubuntu, Ubuntu Kylin, etc.) that are pre-installed on PCs sold in China, etc. As a matter of fact, I have been recently approached recently to update the IANA charset technical summary for "GB18030" (i.e. the original GB 18030-2000) in https://www.iana.org/assignments/charset-reg/GB18030 with the latest updates for GB 18030-2022. (Haha, I am starting to fret about it because I am no expert in GB18030, but many thanks to e.g. Dr. Ken Lunde, the expert in CJKV information processing, who has kindly allowed me to borrow any of his articles in updating the IANA charset documentation for GB18030. I'm not asking you to spend any time working on GB18030; that would be the job of Debian Chinese i18n/L10n team as well as the wider community (glibc, libiconv, Qt, etc.) All I am asking you is to maintain the status quo, and don't discount anything other than UTF-8 as legacy. Debian already supports GB 18030-2000 (or GB 18030-2005) rather well. Don't let that existing support die! If anything, we'd need to improve GB18030 support to conform with GB 18030-2022, though thankfully much of that work would likely come from upstream projects or from Debian derivatives or other distros that are actually selling their products in China. Many thanks for your understanding! Kind regards, Anthony Fok