čt 22. 7. 2021 v 0:12 odesílatel Jacob Champion <pchamp...@vmware.com> napsal:
> On Wed, 2021-07-21 at 00:08 +0000, Jacob Champion wrote: > > I note that the doc comment for ucs_wcwidth()... > > > > > * - Spacing characters in the East Asian Wide (W) or East Asian > > > * FullWidth (F) category as defined in Unicode Technical > > > * Report #11 have a column width of 2. > > > > ...doesn't match reality anymore. The East Asian width handling was > > last updated in 2006, it looks like? So I wonder whether fixing the > > code to match the comment would not only fix the emoji problem but also > > a bunch of other non-emoji characters. > > Attached is my attempt at that. This adds a second interval table, > handling not only the emoji range in the original patch but also > correcting several non-emoji character ranges which are included in the > 13.0 East Asian Wide/Fullwidth sets. Try for example > > - U+2329 LEFT POINTING ANGLE BRACKET > - U+16FE0 TANGUT ITERATION MARK > - U+18000 KATAKANA LETTER ARCHAIC E > > This should work reasonably well for terminals that depend on modern > versions of Unicode's EastAsianWidth.txt to figure out character width. > I don't know how it behaves on BSD libc or Windows. > > The new binary search isn't free, but my naive attempt at measuring the > performance hit made it look worse than it actually is. Since the > measurement function was previously returning an incorrect (too short) > width, we used to get a free performance boost by not printing the > correct number of alignment/border characters. I'm still trying to > figure out how best to isolate the performance changes due to this > patch. > > Pavel, I'd be interested to see what your benchmarks find with this > code. Does this fix the original issue for you? > This patch fixed badly formatted tables with emoji. I checked this patch, and it is correct and a step forward, because it dynamically sets intervals of double wide characters, and the code is more readable. I checked and performance, and although there is measurable slowdown, it is negligible in absolute values. Previous code was a little bit faster - it checked less ranges, but was not fully correct and up to date. The patching was without problems There are no regress tests, but I am not sure so they are necessary for this case. make check-world passed without problems I'll mark this patch as ready for committer Regards Pavel > > --Jacob >