Hi Will, On Sun, Aug 27 2017, Will Storey <w...@summercat.com> wrote: > When formatting text for display in the window list, it is possible to > specify a limit to truncate at. This is useful for example with %t when > you have a long title in the window. > > The prior implementation truncated counting by bytes. This was > problematic if the limit happened to be in the middle of a multibyte > character. When that happened, the window list text cut off starting at > the invalid character. > > We now count by characters rather than bytes. This ensures we always > include a full multibyte character. > > It is possible to see the problem with this test case: > > set winfmt %n%s%10t > set winliststyle row > set winname title > > Then create a window such that we truncate in the middle of a multibyte > character. This is possible with the following HTML document: > > <!DOCTYPE html> > <meta charset="utf-8"> > <title>testing ™ 1 2 3</title> > > Assuming you are using UTF-8 encoding, if your browser's title has only > this text, then truncating at 10 will truncate on the second of the > three bytes in the trademark symbol.
First, thanks for your submission. You're dealing with a known problem. The direction taken so far in ratpoison was: don't deal with wide characters, only handle UTF-8 in a rather dumb but at least simple way. Rationale: - the wide characters API has a lot of gotchas. I won't detail them here but what to do in case of an invalid sequence often remains an open question. Here, I can see that you return a partial length early. I'm not sure this is desirable. - UTF-8 is easy and looks like the sanest choice for a multibyte locale. No offense, but other less commonly used locales are just a pain to handle. Think state-dependant encodings. So while technically speaking the wide characters API looks like the obvious choice, I think its cost is a bit high. Consistency is good. If we start using the wide chars API somewhere, it should be used in all places where it makes sense. I'm not sure this is an easy task even in ratpoison. :) Handling only UTF-8 as a multibyte locale, the tentative diff below seems to do the job. *WARNING*: I have barely tested it with your html testcase. Feedback / test reports welcome. diff --git a/src/format.c b/src/format.c index caf8781..fa8b068 100644 --- a/src/format.c +++ b/src/format.c @@ -82,11 +82,18 @@ concat_width (struct sbuf *buf, char *s, int width) { if (width >= 0) { - char *s1 = xsprintf ("%%.%ds", width); - char *s2 = xsprintf (s1, s); - sbuf_concat (buf, s2); - free (s1); - free (s2); + int len = 0; + + while (s[len] != '\0' && len < width) + { + if (RP_IS_UTF8_START (s[len])) + do + len++; + while (RP_IS_UTF8_CONT (s[len])); + else + len++; + } + sbuf_printf_concat (buf, "%.*s", len, s); } else sbuf_concat (buf, s); -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
signature.asc
Description: PGP signature
_______________________________________________ Ratpoison-devel mailing list Ratpoison-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/ratpoison-devel