Hi Walter,

Walter Alejandro Iglesias wrote on Mon, Aug 18, 2025 at 06:40:04PM +0200:

> Question for the experts.  Let's take the following example:
> 
> ----->8------------->8--------------------
> #include <stdio.h>
> #include <string.h>
> #include <wchar.h>
> 
> #define period                0x2e
> #define question      0x3f
> #define exclam                0x21
> #define ellipsis      L'\u2026'
> 
> const wchar_t p[] = { period, question, exclam, ellipsis };

In addition to what otto@ said, this is bad style for more than one
reason.

First of all, that data type of the constant "0x2e" is "int",
see for example C11 6.4.4.1 (Integer constants).  Casting "int"
to "wchar_t" doesn't really make sense.  On OpenBSD, it only
works because UTF-8 is the only supported character encoding *and*
wchar_t stores Unicode codepoints.  But neither of these choices
are portable.  What you want is (C11 6.4.4.4 Character constants):

  #define period        L'.'
  #define question      L'?'
  #define exclam        L'!'

> int
> main()
> {
>       const wchar_t s[] = L". Hello.";
> 
>       printf("%ls\n", s);
>       printf("%lu\n", wcsspn(s, p));

The return value of wcsspn(3) is size_t, so this should use %zu.

Besides, given that the second argument of wcsspn(3)
takes "const wchar_t *", why not simply:

  const wchar_t *p = L".?!\u2026";

And finally, if you want wchar_t to store UTF-8 strings, you need
something like

  #include <err.h>
  #include <locale.h>

  if (setlocale(LC_CTYPE, "C.UTF-8") == NULL)
        errx(1, "setlocale failed");

Otherwise, the C library function operating on wide strings
assume that wchar_t only stores ASCII character numbers.
Even printf(3) %ls won't work for UTF-8 characters without
setting the locale properly.

Yours,
  Ingo

Reply via email to