Hi, Ingo Schwarze wrote on Tue, Dec 08, 2015 at 10:37:29PM +0100:
> here is UTF-8 support for fmt(1). > This does not include the -c case; the patch is already large enough. Meanwhile, i committed that. Here is a simple solution for the -c case. The loop in center_stream() is designed to be similar to the loop in process_stream(), but it's a bit simpler. This patch implies two changes in behaviour, though. First, i don't see why fmt -c should pass through invalid bytes, given that fmt without -c weeds them out. So handle them the same way in both cases, replace them with ASCII question marks. Then, the concept of tabs inside lines that are to be centered makes no sense in the first place. In the past, the width of such a tab depended on the leading whitespace on the line, even though that whitespace was otherwise ignored. Yet, tabs on subsequent lines did not align because the leading space on output depends on the width of the string following the tab. None of that was useful. I see no way to define the meaning of a tab in a line that is to be centered in a more useful way. If we want tabs on subsequent centered lines to align, the *number* of tabs needed will depend on the width of the string *following* the last tab. That is completely intransparent to people writing such files, and i see no way to prepare such files correctly without experimentation. Even then, the output positioning of the text preceding the tab remains ill-defined. So, i propose that in lines to be centered, we just replace each tab with one single blank. That is easy to understand, easy to implement, and not less useful than any other solution i can think of. OK? Ingo Index: fmt.c =================================================================== RCS file: /cvs/src/usr.bin/fmt/fmt.c,v retrieving revision 1.34 diff -u -p -r1.34 fmt.c --- fmt.c 15 Dec 2015 16:26:17 -0000 1.34 +++ fmt.c 16 Dec 2015 10:37:27 -0000 @@ -620,13 +620,29 @@ output_word(size_t indent0, size_t inden static void center_stream(FILE *stream, const char *name) { - char *line; - size_t l; + char *line, *cp; + wchar_t wc; + size_t l; /* Display width of the line. */ + int wcw; /* Display width of one character. */ + int wcl; /* Length in bytes of one character. */ while ((line = get_line(stream)) != NULL) { - while (isspace((unsigned char)*line)) - ++line; - l = strlen(line); + l = 0; + for (cp = line; *cp != '\0'; cp += wcl) { + if (*cp == '\t') + *cp = ' '; + if ((wcl = mbtowc(&wc, cp, MB_CUR_MAX)) == -1) { + (void)mbtowc(NULL, NULL, MB_CUR_MAX); + *cp = '?'; + wcl = 1; + wcw = 1; + } else if ((wcw = wcwidth(wc)) == -1) + wcw = 1; + if (l == 0 && iswspace(wc)) + line += wcl; + else + l += wcw; + } while (l < goal_length) { putchar(' '); l += 2;