Hell Walter, Walter Alejandro Iglesias wrote on Wed, Aug 20, 2025 at 09:18:52AM +0200: > On Tue, Aug 19, 2025 at 05:39:13PM +0200, Ingo Schwarze wrote: >> Walter Alejandro Iglesias wrote on Mon, Aug 18, 2025 at 06:40:04PM +0200:
>>> #define period 0x2e >>> #define question 0x3f >>> #define exclam 0x21 >>> #define ellipsis L'\u2026' >>> const wchar_t p[] = { period, question, exclam, ellipsis }; >> In addition to what otto@ said, this is bad style for more than one >> reason. >> >> First of all, that data type of the constant "0x2e" is "int", >> see for example C11 6.4.4.1 (Integer constants). Casting "int" >> to "wchar_t" doesn't really make sense. On OpenBSD, it only >> works because UTF-8 is the only supported character encoding *and* >> wchar_t stores Unicode codepoints. But neither of these choices >> are portable. What you want is (C11 6.4.4.4 Character constants): >> >> #define period L'.' >> #define question L'?' >> #define exclam L'!' > As I made this change to my code (https://en.roquesor.com/fmtroff.html) > the following reminded me why, at some point, I decided to switch to > hexadecimal notation. > > #define backslash L'\\' > #define apostrophe L'\'' > > It isn't very confusing there, but among the arguments of a function or > a conditional... Making code look nice is nice to have and can even make code more readable and hence reduce the likelihood of bugs. But even if you are coding with narrow strings for ASCII only, whether char mychar = 0x5c; char mychar = 92; char mychar = 0134; is more readable than char mychar = '\\'; is debateable; at least i would find reading the latter easier than the former, even in a conditional or function call argument. For narrow characters, the portability argument is weak; writing code that is portable to EBCDIC machines is the kind of excessive portability that provokes bugs rather than prevent them. But still, i'd recommend against specifying narrow characters numerically. Even mandoc_char(7) says: NUMBERED CHARACTERS For backward compatibility with existing manuals, mandoc(1) also supports the \N'number' and \[charnumber] escape sequences, inserting the character number from the current character set into the output. Of course, this is inherently non-portable and is already marked as deprecated in the Heirloom roff manual; on top of that, the second form is a GNU extension. For example, do not use \N'34' or \[char34], use \(dq, or even the plain `"' character where possible. A similar recommendation makes sense for C code. What *is* portable is specifying wide characters by Unicode codepoint numbers, for example: wchar_t mywide = L'\u2026'; /* horizontal ellipsis */ But note that the C standard (C11 6.4.3.2 Universal character names) explicitly requires the argument to \u to be at least 00A0, with only three exceptions: L'\u0024' == L'$' L'\u0040' == L'@' L'\u0060' == L'`' Being so specific is a weird quirk of the standard, but it means you should better not abuse \u to obfuscate ASCII codepoints - apart from being very ugly, it may not even work. For example, current base clang dies like this: error: character 'A' cannot be specified by a universal character name 13 | wchar_t mywide = L'\u0041'; 1 error generated. So there is no real alternative to L'\\'. While L'\x5c' and L'\134' work for UTF-8 (and hence on OpenBSD), even that is not guaranteed to be portable, and what those two produce may depend both on the implementation and on the locale. Yours, Ingo