On 2023-07-02 06:33, Bruno Haible wrote:
+                    else if (bytes == (size_t) -3)
+                      bytes = 0;

Why is this sort of thing needed? I thought that (size_t) -3 was possible only after a low surrogate, which is possible when decoding valid UTF-16 to Unicode, but not when decoding valid UTF-8 to Unicode. When can we get (size_t) -3 in a real-world system?

If (size_t) -3 is possible, I suppose I should change diffutils to take this into account, as bleeding-edge diffutils/src/side.c treats (size_t) -3 as meaning the next input byte is an encoding error, which is obviously wrong. The simplest way to fix this would be for diffutils to go back to using wchar_t, although I don't know what the downsides of that would be (diffutils doesn't care about Unicode; all it cares is about is character classes and print widths).

Reply via email to