On 12/5/18 2:09 AM, Tom Honermann wrote:
On 12/3/18 5:01 PM, Jason Merrill wrote:
On 12/3/18 4:51 PM, Jason Merrill wrote:
On 11/5/18 2:39 PM, Tom Honermann wrote:
This patch adds support for the P0482R5 core language changes. This
includes:
- The -fchar8_t and -fno_char8_t command line options.
- char8_t as a keyword.
- The char8_t builtin type as a non-aliasing unsigned integral
character type of size 1.
- Use of char8_t as a simple type specifier.
- u8 character literals with type char8_t.
- u8 string literals with type array of const char8_t.
- User defined literal operators that accept char8_1 and char8_t
pointer
types.
- New __cpp_char8_t predefined feature test macro.
- New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
macros .
- Name mangling and demangling for char8_t (using Du).
gcc/ChangeLog:
2018-11-04 Tom Honermann <t...@honermann.net>
* defaults.h: Define CHAR8_TYPE.
gcc/c-family/ChangeLog:
2018-11-04 Tom Honermann <t...@honermann.net>
* c-family/c-common.c (c_common_reswords): Add char8_t.
(fix_string_type): Use char8_t for the type of u8 string
literals.
(c_common_get_alias_set): char8_t doesn't alias.
(c_common_nodes_and_builtins): Define char8_t as a builtin
type in
C++.
(c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
(keyword_begins_type_specifier): Add RID_CHAR8.
* gcc/c-family/c-common.h (rid): Add RID_CHAR8.
(c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
Define char8_type_node and char8_array_type_node.
* c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
__GCC_ATOMIC_CHAR8_T_LOCK_FREE.
(c_cpp_builtins): Predefine __cpp_char8_t.
* c-family/c-lex.c (lex_string): Use char8_array_type_node as the
type of CPP_UTF8STRING.
(lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
* c-family/c.opt: Add the -fchar8_t command line option.
gcc/c/ChangeLog:
2018-11-04 Tom Honermann <t...@honermann.net>
* c/c-typeck.c (char_type_p): Add char8_type_node.
(digest_init): Handle initialization by a u8 string literal of
char8_t type.
gcc/cp/ChangeLog:
2018-11-04 Tom Honermann <t...@honermann.net>
* cp/cvt.c (type_promotes_to): Handle char8_t promotion.
* cp/decl.c (grokdeclarator): Handle invalid type specifier
combinations involving char8_t.
* cp/lex.c (init_reswords): Add char8_t as a reserved word.
* cp/mangle.c (write_builtin_type): Add name mangling for char8_t
(Du).
* cp/parser.c (cp_keyword_starts_decl_specifier_p,
cp_parser_simple_type_specifier): Recognize char8_t as a simple
type specifier.
(cp_parser_string_literal): Use char8_array_type_node for the
type
of CPP_UTF8STRING.
(cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in
system
headers.
* cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
* cp/tree.c (char_type_p): Recognize char8_t as a character type.
* cp/typeck.c (string_conv_p): Handle conversions of u8 string
literals of char8_t type.
(check_literal_operator_args): Handle UDLs with u8 string
literals
of char8_t type.
* cp/typeck2.c (digest_init_r): Disallow initializing a char
array
with a u8 string literal.
libiberty/ChangeLog:
2018-10-31 Tom Honermann <t...@honermann.net>
* cp-demangle.c (cplus_demangle_builtin_types,
cplus_demangle_type): Add name demangling for char8_t (Du).
* cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
new char8_t type.
@@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
if (!TYPE_P (t))
return -1;
+ /* Unlike char, char8_t doesn't alias. */
+ if (flag_char8_t && t == char8_type_node)
+ return -1;
This seems unnecessary; doesn't the existing code have the same
effect? I think we could do with just an adjustment to the existing
comment.
I'm not sure. I had concerns about unintended matching due to char8_t
having an underlying type of unsigned char.
That shouldn't be a problem: if char8_t is a distinct type, it won't
match unsigned char, and if it's the same as unsigned char, flag_char8_t
will be false.
+ else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+ || (flag_char8_t && type == char8_type_node)
+ bool char8_array = (flag_char8_t && !!comptypes (typ1,
char8_type_node));
+ || (flag_char8_t && type == char8_type_node
In many places you check the flag and then for one of the char8
types. Since the types won't be used without the flag, checking the
flag seems redundant?
This was again protection against unintended matching of the underlying
unsigned char type, particularly when compiling as C. char8_type_node is
constructed (in c_common_nodes_and_builtins) following the pattern in
place for char16_t and char32_t with the following code:
+ char8_type_node = get_identifier (CHAR8_TYPE);
+ char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+ char8_type_size = TYPE_PRECISION (char8_type_node);
+ if (c_dialect_cxx ())
+ {
+ char8_type_node = make_unsigned_type (char8_type_size);
+
+ if (flag_char8_t)
+ record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+ }
My knowledge of gcc internals is weak, but I understand this to be,
effectively, defining a type alias (of unsigned char) and then, if
compiling as C++, re-defining it as a strong unsigned type.
I don't recall the details now, but at one point, I was missing a check
for flag_char8_t in some location and I encountered some test failures
as a result.
Since char8_type_node is always a distinct type in C++, we shouldn't
need these checks in the C++ front end. And since it's never a distinct
type in C, the C front end (c/*) doesn't need to change at all.
- if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
+ if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
+ && (typ1 == char_type_node || !flag_char8_t))
This looks wrong, or at least incomplete; we want to complain about
mismatched types here even with -fchar8-t. Perhaps we should replace
all of this if/else with simply comparing typ1 and char_type, and
complaining if they're different. Talking about wide and non-wide
isn't as useful as the actual types would be.
Well, I suppose it isn't quite that simple, since we still need to
treat the ordinary character types as interchangeable.
We do need to complain about mismatched types and test
gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that happens.
You don't seem to test initializing an array of signed/unsigned char,
which I think will be broken by the change only considering char_type_node.
I think we want to specifically allow conversion from array of one
ordinary character type to another, and otherwise complain about the
types being different with a message like "cannot initialize array of
%qT from array of %qT" rather than mess with terms like int-array and
(non-)wide string.
Jason