On Fri, 20 Jan 2023 at 09:54:21 -0700, Anthony Fok wrote: > supposedly some older Chinese websites are still using "GBK" as > encoding, probably something like: > > <meta http-equiv="Content-Type" content="text/html;charset=gbk"> > > which has less than 30,000 characters and thus a very limited subset > of Unicode. And, presumably not everyone has the know how to convert > to UTF-8, the Chinese government wants those unable to at least change > that meta tag to: > > <meta http-equiv="Content-Type" content="text/html;charset=gb18030">
Sure, but neither of those actually require us to support GBK or GB 18030 as a system locale, only as something that iconv() (or whatever browsers actually use, which is probably their own thing) can convert into their preferred internal representation (which is almost certainly UTF-8, UTF-16 or UCS-4). Analogously, we've never supported using Windows-1252 (Microsoft's legacy Latin-1 variant) as a system locale encoding in some hypothetical locale like en_US.windows-1252, but HTML documents with text/html;charset=windows-1252 still work fine. > I have the feeling that many tech-savvy Chinese have already switched > to UTF-8, but then perhaps in some circles there are lots of legacy > GB2312/GBK documents or systems that made GB18030 a necessity, as an > intermediate step to Unicode. That doesn't seem so far away from how in some English-speaking circles there are lots of legacy ISO-8859-1, ISO-8859-15 or (more likely) Windows-1252 documents, and we can cope OK with those via transcoding, even in UTF-8 system locales. smcv