Hello, I found a bug in lookup width of UTF-8 data which consists of 4 bytes.
The problem is in utf8_combine. In there the Unicode codepoint is reconstructed from UTF-8 sequence but the first byte is treated incorrectly. According to UTF-8 structure, only last 3 bits of the first byte of the 4-bytes UTF-8 chararcter should be considered as part of the codepoint, while current code use 6 bits. (see http://en.wikipedia.org/wiki/UTF-8#Description ) By this problem, presentation of 4-bytes UTF-8 character is corrupted. For example, when Japanese kanji character "𠮷" (U+20BB7) is displayed on tmux, this character is treated as single width and disappear by overwritten the right half of the cell, or get a weird cursor movement. The following patch will fix the problem. Thanks, Osamu ----------------------------------------------------------------------- diff --git a/utf8.c b/utf8.c index 63723d7..5babcb3 100644 --- a/utf8.c +++ b/utf8.c @@ -313,7 +313,7 @@ utf8_combine(const struct utf8_data *utf8data) value = utf8data->data[3] & 0x3f; value |= (utf8data->data[2] & 0x3f) << 6; value |= (utf8data->data[1] & 0x3f) << 12; - value |= (utf8data->data[0] & 0x3f) << 18; + value |= (utf8data->data[0] & 0x07) << 18; break; } return (value); ------------------------------------------------------------------------------ Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk _______________________________________________ tmux-users mailing list tmux-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tmux-users