Bug#1026231: debian-policy: document droppage of support for legacy locales

Anthony Fok Wed, 18 Jan 2023 15:33:31 -0800

Hello,

On Mon, Dec 19, 2022 at 2:48 PM Bill Allombert <ballo...@debian.org> wrote:
> Which raise the question: does the corresponding user group moved to UTF-8 ?
> Judging from <https://en.wikipedia.org/wiki/Chinese_character_encoding>,
> neither Chinese nor Japanese users have overwhelmingly moved to UTF-8,
> so it would be problematic to stop supporting BIG5, GB18030 and EUC-JP.


Bill, thank you, thank you, thank you!  You speak the voice of reason!

Adam, we living in the West may think of BIG5, GB18030 and EUC-JP as
legacy/obsolete encodings, but in Mainland China, GB18030 is anything
but legacy.  It is a mandatory national standard that has recently
been brought up to date in GB 18030-2022, synchronizing with ISO/IEC
10646:2017 (equivalent to Unicode version 11.0).

"GB 18030 is a national standard with stringent conformance
requirements that regulate eligibility for products or services to be
sold in China."  I personally went through this trying to get the now
defunct ThizLinux distro GB 18030-2000 conformant 20 years ago.  GB
18030-2022 will become mandatory on 2023-08-01.  Why the urgency?  To
add support 17000+ rarer CJK Han characters found in people's and
place names, as well as improving support for minor ethnic languages
in China.  And the GB18030 standard committee is really serious about
promoting GB18030 because they are eager to resolve some real issues
of "missing characters" that are negatively affecting the people
living in China.  To my pleasant surprise, they are putting out a
public lecture webinar series that explains the why and the how of
implementing GB 18030-2022, with the 3rd video published on
2022-12-30.  In their mind, GB 18030 encompasses a lot more than just
a character encoding mapping table.  It is the full support package
(including fonts, display, printing, input methods, etc.) for Han
Chinese and all other minority languages used in China.

See e.g. the following excellent articles for more information:

 * https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132
 * https://www.unicode.org/L2/L2022/22274-disruptive-changes.pdf

Even though Debian is not proprietary/commercial software, the GB
18030 authority highly recommends that free/libre and open-source
software _do_ implement GB 18030-2022.  That's especially true
considering the fact that vendors in China may be offering Debian as a
solution for clients, but they would be prevented from doing so if
Debian Policy spells out "We support UTF-8 and UTF-8 only.  Think of
all the ARM and RISC-V single-board computers made in China where
Debian is the default OS image; Debian or derivatives (Ubuntu, Ubuntu
Kylin, etc.) that are pre-installed on PCs sold in China, etc.

As a matter of fact, I have been recently approached recently to
update the IANA charset technical summary for "GB18030" (i.e. the
original GB 18030-2000) in
https://www.iana.org/assignments/charset-reg/GB18030 with the latest
updates for GB 18030-2022.  (Haha, I am starting to fret about it
because I am no expert in GB18030, but many thanks to e.g. Dr. Ken
Lunde, the expert in CJKV information processing, who has kindly
allowed me to borrow any of his articles in updating the IANA charset
documentation for GB18030.

I'm not asking you to spend any time working on GB18030; that would be
the job of Debian Chinese i18n/L10n team as well as the wider
community (glibc, libiconv, Qt, etc.)  All I am asking you is to
maintain the status quo, and don't discount anything other than UTF-8
as legacy.  Debian already supports GB 18030-2000 (or GB 18030-2005)
rather well.  Don't let that existing support die!  If anything, we'd
need to improve GB18030 support to conform with GB 18030-2022, though
thankfully much of that work would likely come from upstream projects
or from Debian derivatives or other distros that are actually selling
their products in China.

Many thanks for your understanding!

Kind regards,

Anthony Fok

Bug#1026231: debian-policy: document droppage of support for legacy locales

Reply via email to