[9fans] utf-8 handling oddities

la-ninpre Fri, 13 Oct 2023 13:30:18 -0700

greetings, 9fans.

recently i have been studying utf-8 encoding and decided to look at how it is 
handled in plan 9. i thought that since plan 9 was the first application of 
this encoding, it makes sense to look at its implementation. the fact that 
mentioned implementation was done by designers of the encoding themselves only 
adds to this decision.


so i grabbed the last release tarball from p9f.org and studied it. but when i 
was testing some other implementations to compare how each handles 
encoding/decoding errors, i noticed that the same code linked with plan9port's 
lib9 behaves differently (or may i say, incorrectly) when dealing with 
surrogate halves than that original plan 9 implementation. i started digging 
through archive versions of the same code only to find out that the 
implementation changed only after the release of fourth edition. specifically, 
i looked at /sys/src/libc/port/rune.c file. the version that i studied was 
taken from so called 'latest release' on p9f page. the timestamp on that file 
says that it was last modified in 2013, while the rest of the code is 
timestamped at 2002. inferno os source code too has this change ported to it 
around the same time.

if i understand it correctly, unicode extended past the BMP in 1996 with the 
release of unicode 2.0. plan 9 had two editions released after that, but, of 
course assuming that archives on p9f are indeed correct, the implementation 
didn't reflect the change in the code until 2013 (and that's why that old code 
propagated to both plan9port and 9front). so, maybe someone knows why is that 
the case? i'd appreciate any input on this or some pointers to information 
resources that you may know of.

best regards,
la ninpre.

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T8384b8174eb88096-M127761f645d18b8419fc4f9b
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

[9fans] utf-8 handling oddities

Reply via email to