[Patch] Lookup width of UTF-8 character is wrong

Koga Osamu Sat, 08 Mar 2014 05:16:21 -0800

Hello,

I found a bug in lookup width of UTF-8 data which consists of 4 bytes.


The problem is in utf8_combine.
In there the Unicode codepoint is reconstructed from UTF-8 sequence but
the first byte is treated incorrectly.
According to UTF-8 structure, only last 3 bits of the first byte of the
4-bytes UTF-8 chararcter should be considered as part of the codepoint,
while current code use 6 bits.
(see http://en.wikipedia.org/wiki/UTF-8#Description )

By this problem, presentation of 4-bytes UTF-8 character is corrupted.
For example, when Japanese kanji character "𠮷" (U+20BB7) is displayed
on tmux, this character is treated as single width and disappear by
overwritten the right half of the cell, or get a weird cursor movement.

The following patch will fix the problem.

Thanks,
Osamu

-----------------------------------------------------------------------
diff --git a/utf8.c b/utf8.c
index 63723d7..5babcb3 100644
--- a/utf8.c
+++ b/utf8.c
@@ -313,7 +313,7 @@ utf8_combine(const struct utf8_data *utf8data)
                value = utf8data->data[3] & 0x3f;
                value |= (utf8data->data[2] & 0x3f) << 6;
                value |= (utf8data->data[1] & 0x3f) << 12;
-               value |= (utf8data->data[0] & 0x3f) << 18;
+               value |= (utf8data->data[0] & 0x07) << 18;
                break;
        }
        return (value);


------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
tmux-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/tmux-users

[Patch] Lookup width of UTF-8 character is wrong

Reply via email to