>Number:         164317
>Category:       bin
>Synopsis:       [patch] sbin/write: add multibyte character support
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Thu Jan 19 20:20:08 UTC 2012
>Closed-Date:
>Last-Modified:
>Originator:     Dmitry Marakasov
>Release:        FreeBSD 9.0-RC2 amd64
>Organization:
>Environment:
System: FreeBSD hades.panopticon 9.0-RC2 FreeBSD 9.0-RC2 #0: Tue Nov 29 
07:18:03 MSK 2011 root@hades.panopticon:/usr/work/usr/src/sys/HADES amd64


>Description:
Currently write(1) doesn't handle utf8 locale at all:

    (this is Russian)
# echo "Проверка" | write amdmi3 pts/29

Message from amdmi3@hades.panopticon on pts/29 at 23:31 ...
M-PM-^_M-QM-^@M-PM->M-PM-2M-PM-5M-QM-^@M-PM-:M-PM-0
EOF

checks used in character printing routine (((*s & 0x80) && *s < 0xA0)) seem to 
assume specific encoding (for example, CP866 has letters in 0x80-0xA0), so this 
will not work correctly for even 8 bit locales.

The utility is easily convertable to wchar_t however, which should handle them 
all, and the patch for it is attached.

% (echo "Проверка"; echo "Some control characters: 
\b\t^[[D^[[C^[[A^[[B^[") | ./write amdmi3 pts/29

Message from amdmi3@hades.panopticon on pts/29 at 23:43 ...
Проверка
Some control characters: <0x8>  <0x1B>[D<0x1B>[C<0x1B>[A<0x1B>[B<0x1B>
EOF

The way of displaying non-printable characters is discussable, but since one 
can neither assume that locale is UTF nor that that wchar_t value is somehow 
linked to codepoint, it would be inappropriate to use notations like U+%X or 
\u%X or &#%d; and/or modify wchar_t with bitwise operations. Notation like 
<0x%X> however is charset-agnostic and pretty readable, so I think it's quite 
suitable here.

>How-To-Repeat:
>Fix:
Index: write.1
===================================================================
--- write.1     (revision 230334)
+++ write.1     (working copy)
@@ -107,7 +107,3 @@
 terminal, not the receiver's (which
 .Nm
 has no way of knowing).
-.Pp
-The
-.Nm
-utility does not recognize multibyte characters.
Index: write.c
===================================================================
--- write.c     (revision 230334)
+++ write.c     (working copy)
@@ -60,12 +60,14 @@
 #include <string.h>
 #include <unistd.h>
 #include <utmpx.h>
+#include <wchar.h>
+#include <wctype.h>
 
 void done(int);
 void do_write(char *, char *, uid_t);
 static void usage(void);
 int term_chk(char *, int *, time_t *, int);
-void wr_fputs(unsigned char *s);
+void wr_fputs(wchar_t *s);
 void search_utmp(char *, char *, char *, uid_t);
 int utmp_chk(char *, char *);
 
@@ -243,7 +245,8 @@
        char *nows;
        struct passwd *pwd;
        time_t now;
-       char path[MAXPATHLEN], host[MAXHOSTNAMELEN], line[512];
+       char path[MAXPATHLEN], host[MAXHOSTNAMELEN];
+       wchar_t line[512];
 
        /* Determine our login name before we reopen() stdout */
        if ((login = getlogin()) == NULL) {
@@ -269,7 +272,7 @@
        (void)printf("\r\n\007\007\007Message from %s@%s on %s at %s ...\r\n",
            login, host, mytty, nows + 11);
 
-       while (fgets(line, sizeof(line), stdin) != NULL)
+       while (fgetws(line, sizeof(line)/sizeof(wchar_t), stdin) != NULL)
                wr_fputs(line);
 }
 
@@ -288,30 +291,20 @@
  *     turns \n into \r\n
  */
 void
-wr_fputs(unsigned char *s)
+wr_fputs(wchar_t *s)
 {
 
-#define        PUTC(c) if (putchar(c) == EOF) err(1, NULL);
+#define        PUTC(c) if (putwchar(c) == WEOF) err(1, NULL);
 
-       for (; *s != '\0'; ++s) {
-               if (*s == '\n') {
-                       PUTC('\r');
-               } else if (((*s & 0x80) && *s < 0xA0) ||
-                          /* disable upper controls */
-                          (!isprint(*s) && !isspace(*s) &&
-                           *s != '\a' && *s != '\b')
-                         ) {
-                       if (*s & 0x80) {
-                               *s &= ~0x80;
-                               PUTC('M');
-                               PUTC('-');
-                       }
-                       if (iscntrl(*s)) {
-                               *s ^= 0x40;
-                               PUTC('^');
-                       }
+       for (; *s != L'\0'; ++s) {
+               if (*s == L'\n') {
+                       PUTC(L'\r');
+                       PUTC(L'\n');
+               } else if (iswprint(*s) || iswspace(*s)) {
+                       PUTC(*s);
+               } else {
+                       wprintf(L"<0x%X>", *s);
                }
-               PUTC(*s);
        }
        return;
 #undef PUTC
>Release-Note:
>Audit-Trail:
>Unformatted:
_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to