On 5/10/22 9:27 AM, Jonathan Wakely wrote:
On Mon, 9 May 2022 at 11:09, Andreas Schwab wrote:
On Mai 09 2022, Florian Weimer via Gcc wrote:
* Ulrich Drepper via Gcc:
t.cc: In function ‘int main()’:
t.cc:5:24: warning: format string is not an array of type ‘char’ [-Wformat=]
5 | printf((const char*) u8"test %d\n", 1);
| ^~~~~~~~~~~~~
This is not an aliasing violation because of the exception for char,
right? So the warning does not even highlight theoretical undefined
behavior.
On the other hand, that cast is still quite ugly. All string-related
functions in the C library currently need it. It might obscure real
type errors. Isn't this a problem with char8_t?
In C++20, u8 literals have a distinct type, which is an incompatible
change from C++17.
And the recommended way to deal with it is to use a cast as Ulrich did.
Thanks for copying me, Jonathan.
From the perspective of the standard, printf() expects its format
string to be specified in the locale dependent multibyte encoding, so
passing a UTF-8 encoded string is, of course, not guaranteed to produce
a useful result (and certainly would not on, for example, an
EBCDIC-based platform).
I would not recommend use of a cast in this case, but would rather ask
why there is a perceived need to specify a u8 prefixed string literal at
all. If the locale is expected/required to be UTF-8 for the program to
work as intended, then the execution character set is presumably set to
be (or should be) UTF-8 as well in which case an ordinary string literal
will be UTF-8 encoded and there is no need to use a u8 prefixed string
literal. So, instead of adding a cast, I would recommend removing the u8
prefix.
Tom.