Hi,

Ingo Schwarze wrote on Tue, Dec 08, 2015 at 10:37:29PM +0100:

> here is UTF-8 support for fmt(1).
> This does not include the -c case; the patch is already large enough.

Meanwhile, i committed that.
Here is a simple solution for the -c case.
The loop in center_stream() is designed to be similar
to the loop in process_stream(), but it's a bit simpler.

This patch implies two changes in behaviour, though.

First, i don't see why fmt -c should pass through invalid bytes,
given that fmt without -c weeds them out.  So handle them the same
way in both cases, replace them with ASCII question marks.

Then, the concept of tabs inside lines that are to be centered makes
no sense in the first place.  In the past, the width of such a tab
depended on the leading whitespace on the line, even though that
whitespace was otherwise ignored.  Yet, tabs on subsequent lines
did not align because the leading space on output depends on the
width of the string following the tab.  None of that was useful.

I see no way to define the meaning of a tab in a line that is to
be centered in a more useful way.  If we want tabs on subsequent
centered lines to align, the *number* of tabs needed will depend
on the width of the string *following* the last tab.  That is
completely intransparent to people writing such files, and i see
no way to prepare such files correctly without experimentation.
Even then, the output positioning of the text preceding the tab
remains ill-defined.

So, i propose that in lines to be centered, we just replace each
tab with one single blank.  That is easy to understand, easy to
implement, and not less useful than any other solution i can think
of.

OK?
  Ingo


Index: fmt.c
===================================================================
RCS file: /cvs/src/usr.bin/fmt/fmt.c,v
retrieving revision 1.34
diff -u -p -r1.34 fmt.c
--- fmt.c       15 Dec 2015 16:26:17 -0000      1.34
+++ fmt.c       16 Dec 2015 10:37:27 -0000
@@ -620,13 +620,29 @@ output_word(size_t indent0, size_t inden
 static void
 center_stream(FILE *stream, const char *name)
 {
-       char *line;
-       size_t l;
+       char *line, *cp;
+       wchar_t wc;
+       size_t l;       /* Display width of the line. */
+       int wcw;        /* Display width of one character. */
+       int wcl;        /* Length in bytes of one character. */
 
        while ((line = get_line(stream)) != NULL) {
-               while (isspace((unsigned char)*line))
-                       ++line;
-               l = strlen(line);
+               l = 0;
+               for (cp = line; *cp != '\0'; cp += wcl) {
+                       if (*cp == '\t')
+                               *cp = ' ';
+                       if ((wcl = mbtowc(&wc, cp, MB_CUR_MAX)) == -1) {
+                               (void)mbtowc(NULL, NULL, MB_CUR_MAX);
+                               *cp = '?';
+                               wcl = 1;
+                               wcw = 1;
+                       } else if ((wcw = wcwidth(wc)) == -1)
+                               wcw = 1;
+                       if (l == 0 && iswspace(wc))
+                               line += wcl;
+                       else
+                               l += wcw;
+               }
                while (l < goal_length) {
                        putchar(' ');
                        l += 2;

Reply via email to