> Am 06.07.2024 um 16:56 schrieb Jakub Jelinek <ja...@redhat.com>:
>
> On Sat, Jul 06, 2024 at 02:45:45PM +0200, Richard Biener wrote:
>>> Anyway, thoughts on this before I spend too much time on it?
>>
>> Why do we have an "element type"? Would
>>
>> int a[] = {
>> #embed "cc1plus"
>> };
>>
>> be valid?
>
> Yes, that is valid.
> The way #embed is defined for C is that it is essentially just
> as if a huge sequence of integer literals like
> 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0
> so it can appear anywhere in the IL where the grammar allows something
> like that. So even
> void foo (...);
> void bar ()
> {
> foo (
> #embed "cc1plus"
> );
> int i = 1 + (
> #embed "cc1plus"
> ) + 2;
> }
> etc. is valid.
> I chose to greatly simplify things by not emitting CPP_EMBED for the
> boundary numbers of the sequence because otherwise one needs to deal with
> significantly more special cases, one can have
> const unsigned char a[] = { 13 + 25 *
> #embed "cc1plus"
> / 2, 0 };
> for example, or even something expected to be used in C often like
> const unsigned char b[] = {
> [64] =
> #embed "cc1plus"
> };
> and the advantage of the inner sequence elements is we know for sure
> it is preceded by CPP_COMMA and succeeded by it too. If we e.g. used
> CPP_EMBED even for single element sequence, that can appear anywhere
> where a CPP_NUMBER can appear in the grammar, which is basically everywhere.
>
> Right now the patch when lexing CPP_EMBED turns it into a RAW_DATA_CST
> with integer_type_node type, that reflects that it is from the preprocessor
> a sequence of int literals, and then when parsing an initializer peels off
> bytes into it, see e.g. the c-c++-common/cpp/embed-19.c test in the
> patch where some of the sequence elements initialize some fields in a
> struct, others an unsigned char array field and others some other fields
> again. To simplify things it only keeps around the RAW_DATA_CST in the
> initializer of ARRAY_TYPE CONSTRUCTORs if they have INTEGER_TYPE elements
> with CHAR_BIT precision, so
> int a[] = {
> #embed "cc1plus"
> };
> is peeled off into a huge sequence of INTEGER_CST CONSTRUCTOR_ELTs.
> In theory if this is something that appears often enough in real-world code
> we might use RAW_DATA_CST even for that case, basically allocate 4 times as
> big backing STRING_CST and based on target endianity and storage reverse
> extend it from one buffer to another one. I'd prefer to do that only if
> we really see people actually want that, because it will be more work.
>
>> I suppose #embed itself is just "embedding" the target(?)
>> representation and the file encodes that in bytes as if laid out in
>> memory?
>
> It is designed pretty much as the values you get by fread into unsigned char
> array.
>
>> Does anything in the #embed spec require actually reading the contents
>> of the embedded file? For the above a fstat() would be enough to
>> deduce the size of a[].
>
> For regular files, fstat would be good enough, for non-regular files one
> really has to read them into memory.
> But I think there are so many cases where we actually need to read and
> inspect the values at compile time I think having it always in memory
> (as implemented in the patch set) doesn't hurt. E.g. for the first and
> last byte of the sequence we need to read those, any time one e.g. during
> constant expression evaluation does something like:
> constexpr unsigned char a[] = {
> #embed "cc1plus"
> };
> constexpr int b = a[6832];
> etc. we really need to read the value and interpret; similarly during
> optimizations we often do that as well. ICF hashes the data to decide
> what is the same, ...
> Sure, having it all in memory will mean > 2GB embeds in 32-bit compilers
> will be tough, but in 64-bit compilers should work just fine, while I think
> e.g. right now you can't have an initialized > 4GB array without gaps
> because CONSTRUCTOR_ELTS is a vector and that uses unsigned int length.
>
> What I think is important that we if at all possible keep it in memory once
> and refer to the libcpp buffer holding the file, don't copy stuff over and
> over, that is one of the reasons why compiling that
> #embed "cc1plus"
> right now without the optimizations (i.e. as the
> 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0
> 261M sequence) just eats more than 26GBs and 5 minutes (stopped it after
> that). E.g. STRING_CST is inappropriate because it owns the data (data sits
> in its payload) and currently is only valid as the whole initializer of
> the array, not just part of it.
>
>> When preprocessing only I suppose #embed
>> isn't "resolved", right?
>
> The series as posted will with -E preprocess it into something like
> 118,
> # 10 "embed-10.c"
> #embed "." __gnu__::__base64__( \
> "b2lkCmZvbyAodW5zaWduZWQgY2hhciAqcCkKewp9CgppbnQKbWFpbiAoKQp7CiAgdW5zaWduZWQg"
> \
> "Y2hhciBhW10gPSB7CiAgICAjZW1iZWQgX19GSUxFX18KICB9OwogIGZvbyAoYSk7Cn0=")
> # 10 "embed-10.c"
> ,10
> (so that it is pedantically valid but can be decoded back cheaply).
> Other option is to emit that
> 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0
> but then we don't handle well megabytes of data and gigabytes of them are
> out of question, or keep the original #embed in there (that is what clang
> does with some new -dE option), but that isn't really preprocessing, because
> one has to copy the preprocessed file and all the embed files as well).
I see. I was wondering because PCH includes are not resolved. That said, it
sounds like #embed is sadly defined on
The preprocessor side rather than in the language where it would have been easy
to constrain uses to those that make sense…
>> I would say we should by default just record a reference to the file
>> on disk, so RAW_DATA_CST should have a pointer to the backing store
>> and actual reading of the data should be done on-demand only
>> (like if required by constexpr or if we desire to constant fold).
>> IIRC gas supports embedding data as well.
>
> Indeed, gas has .incbin directive, but I'd say we could use it only
> if we know it is a regular file and it will be immediately assembled.
> If one does -save-temps or -S, I think we'd better make the assembler
> self-contained.
>
> The RAW_DATA_CST in the patch has a tree owner (meant e.g. for the PCH
> case where we need to copy the data into a STRING_CST or something similar
> that owns the data and is PCH saved/restored), so perhaps the original
> owner could be some new tree that has the needed libcpp details in it
> (filename for regular file and start and end of the libcpp buffer + offset),
> so one could reconstruct the .incbin from it if the driver tells us we
> aren't saving the assembler file for later use.
>
> For the cases where .incbin can't be used, currently GCC emits:
> .string "ELF\002\001\001\003"
> .string ""
> .string ""
> .string ""
> .string ""
> .string ""
> .string ""
> .string ""
> .string "\002"
> .string ">"
> .string "\001"
> (and that is what clang with #embed also emits). Might be worth to
> investigate if gas couldn't introduce some new directive to make binary data
> generally more compact (e.g. if base64 encoding couldn't be beneficial).
> Because
> .string
> "(\035\214\034\347_u\244\rz|~\002\253h\267\271\203v\244\266\372\001\353\363\026\346\365\305\211\005\220\372\215h\267\211{\022\257\277'\0256\215G\2013c.~\244\206\360\226|_\226\223\034\177j\232u\300,\003\3273kh\267q\221\302\326\3153\3772\202,\003\327\346\207\3662giJ3\202,\003\327\305\271\234@%v~\2446-\034\257\310\207\302\326=\256h\267\016\237h\267Q\201\023\257\016\313\302\326q\032\\*\205(u\244\237\023t\244\344Vt\244\247\335\243k\007\256\302\326,th\267}\221h\267\317O\034\257\377\373v\244\227\202a\221$\236\3772\263\326X\221\215Mz\244\216\227\034\257F\213\302\326G\316\302\326\033\277\302\326\177\220h\267\023\263\302\326X\236v\244\034Zt\244\003>\177[\0135\022\257\226ph\267|\377\3033Ox\022\257\214\307\340`\356\235\3772M>\245\013\321*\003\327=\377\3033"
> etc. isn't compact at all, that is for many bytes 4 characters per byte.
> base64 would be 4 characters per 3 bytes no matter what the value is.
>
>> When reading in data we might want to support reading only required
>> pieces and possibly have the data compressed.
>>
>> For LTO I fear we need to embed the actual data as we cannot be sure
>> the referenced file can be resolved at LTO link time?
>
> Yes, I think like the PCH header case, we need to add a backing STRING_CST
> for it or something similar (and then not use .incbin at all).
>
>> That is, it would be really nice if we can avoid reading in embedded
>> files and leave that to the assembler.
>
> I really think reading isn't that big problem, the problem is too many walks
> of it, copying the data over and over and especially spending hundreds of
> compiler memory bytes per each byte in it.
>
> Anyway, here is an updated patch where I implemented the
> native_encode_initializer/fold/fold_ctor_reference reads from the
> RAW_DATA_CST data (not with sufficient test coverage for that for now).
>
> If we keep the RAW_DATA_CST, there is one question, currently in the patch
> it is simply CONSTRUCTOR with say
> [0] = 127, [1] = RAW_DATA_CST, [100000] = 0
> or similar, the RAW_DATA_LENGTH of the ctor value implies how many elements
> the sequence has. Another option would be to use RANGE_EXPR for that, so
> [0] = 127, [1 ... 99999] = RAW_DATA_CST, [100000] = 0
> Neither is clean, because generally RANGE_EXPR means the same value is
> repeated many times, while we want a range filled with subsequent bytes
> from the raw data memory. So, RAW_DATA_CST is an exceptional thing in
> either case and in that case avoiding the RANGE_EXPR looked simpler.
> Unless we want to introduce some RANGE_EXPR variant that goes with
> RAW_DATA_CST.
Yeah, I wondered if where the raw data survives we can make it always wrapped
by a CONSTRUCTOR and add a RANGE_TARGET_BYTES element. This may be useful to
encode large initializers more efficiently during/after parsing.
Richard
> --- libcpp/files.cc.jj 2024-07-03 14:52:12.231817485 +0200
> +++ libcpp/files.cc 2024-07-03 15:44:39.248913032 +0200
> @@ -1241,7 +1241,10 @@ finish_embed (cpp_reader *pfile, _cpp_fi
> limit = params->limit;
>
> size_t embed_tokens = 0;
> - if (CPP_OPTION (pfile, directives_only) && limit >= 64)
> + if ((CPP_OPTION (pfile, directives_only)
> + || !CPP_OPTION (pfile, cplusplus))
> + && CPP_OPTION (pfile, lang) != CLK_ASM
> + && limit >= 64)
> embed_tokens = ((limit - 2) / INT_MAX) + (((limit - 2) % INT_MAX) != 0);
>
> size_t max = INTTYPE_MAXIMUM (size_t) / sizeof (cpp_token);
> --- gcc/varasm.cc.jj 2024-05-07 18:10:10.674871087 +0200
> +++ gcc/varasm.cc 2024-07-04 14:58:33.570465411 +0200
> @@ -4875,6 +4875,7 @@ initializer_constant_valid_p_1 (tree val
> case FIXED_CST:
> case STRING_CST:
> case COMPLEX_CST:
> + case RAW_DATA_CST:
> return null_pointer_node;
>
> case ADDR_EXPR:
> @@ -5468,6 +5469,9 @@ array_size_for_constructor (tree val)
> {
> if (TREE_CODE (index) == RANGE_EXPR)
> index = TREE_OPERAND (index, 1);
> + if (value && TREE_CODE (value) == RAW_DATA_CST)
> + index = size_binop (PLUS_EXPR, index,
> + size_int (RAW_DATA_LENGTH (value) - 1));
> if (max_index == NULL_TREE || tree_int_cst_lt (max_index, index))
> max_index = index;
> }
> @@ -5659,6 +5663,12 @@ output_constructor_regular_field (oc_loc
> /* Output the element's initial value. */
> if (local->val == NULL_TREE)
> assemble_zeros (fieldsize);
> + else if (local->val && TREE_CODE (local->val) == RAW_DATA_CST)
> + {
> + fieldsize *= RAW_DATA_LENGTH (local->val);
> + assemble_string (RAW_DATA_POINTER (local->val),
> + RAW_DATA_LENGTH (local->val));
> + }
> else
> fieldsize = output_constant (local->val, fieldsize, align2,
> local->reverse, false);
> --- gcc/tree.h.jj 2024-06-05 19:09:54.046617006 +0200
> +++ gcc/tree.h 2024-07-03 19:41:04.453201043 +0200
> @@ -1165,6 +1165,14 @@ extern void omp_clause_range_check_faile
> #define TREE_STRING_POINTER(NODE) \
> ((const char *)(STRING_CST_CHECK (NODE)->string.str))
>
> +/* In a RAW_DATA_CST */
> +#define RAW_DATA_LENGTH(NODE) \
> + (RAW_DATA_CST_CHECK (NODE)->raw_data_cst.length)
> +#define RAW_DATA_POINTER(NODE) \
> + (RAW_DATA_CST_CHECK (NODE)->raw_data_cst.str)
> +#define RAW_DATA_OWNER(NODE) \
> + (RAW_DATA_CST_CHECK (NODE)->raw_data_cst.owner)
> +
> /* In a COMPLEX_CST node. */
> #define TREE_REALPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.real)
> #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag)
> --- gcc/expr.cc.jj 2024-07-01 11:28:22.704237981 +0200
> +++ gcc/expr.cc 2024-07-05 17:05:52.929836616 +0200
> @@ -7144,6 +7144,12 @@ categorize_ctor_elements_1 (const_tree c
> init_elts += mult * TREE_STRING_LENGTH (value);
> break;
>
> + case RAW_DATA_CST:
> + nz_elts += mult * RAW_DATA_LENGTH (value);
> + unique_nz_elts += RAW_DATA_LENGTH (value);
> + init_elts += mult * RAW_DATA_LENGTH (value);
> + break;
> +
> case COMPLEX_CST:
> if (!initializer_zerop (TREE_REALPART (value)))
> {
> @@ -11788,7 +11794,8 @@ expand_expr_real_1 (tree exp, rtx target
> field, value)
> if (tree_int_cst_equal (field, index))
> {
> - if (!TREE_SIDE_EFFECTS (value))
> + if (!TREE_SIDE_EFFECTS (value)
> + && TREE_CODE (value) != RAW_DATA_CST)
> return expand_expr (fold (value), target, tmode, modifier);
> break;
> }
> @@ -11830,7 +11837,8 @@ expand_expr_real_1 (tree exp, rtx target
> field, value)
> if (tree_int_cst_equal (field, index))
> {
> - if (TREE_SIDE_EFFECTS (value))
> + if (TREE_SIDE_EFFECTS (value)
> + || TREE_CODE (value) == RAW_DATA_CST)
> break;
>
> if (TREE_CODE (value) == CONSTRUCTOR)
> @@ -11847,8 +11855,8 @@ expand_expr_real_1 (tree exp, rtx target
> break;
> }
>
> - return
> - expand_expr (fold (value), target, tmode, modifier);
> + return expand_expr (fold (value), target, tmode,
> + modifier);
> }
> }
> else if (TREE_CODE (init) == STRING_CST)
> --- gcc/tree-pretty-print.cc.jj 2024-06-14 19:45:09.446777591 +0200
> +++ gcc/tree-pretty-print.cc 2024-07-04 14:58:33.571465397 +0200
> @@ -2519,6 +2519,28 @@ dump_generic_node (pretty_printer *pp, t
> }
> break;
>
> + case RAW_DATA_CST:
> + for (unsigned i = 0; i < (unsigned) RAW_DATA_LENGTH (node); ++i)
> + {
> + if (TYPE_UNSIGNED (TREE_TYPE (node))
> + || TYPE_PRECISION (TREE_TYPE (node)) > CHAR_BIT)
> + pp_decimal_int (pp, ((const unsigned char *)
> + RAW_DATA_POINTER (node))[i]);
> + else
> + pp_decimal_int (pp, ((const signed char *)
> + RAW_DATA_POINTER (node))[i]);
> + if (i == RAW_DATA_LENGTH (node) - 1U)
> + break;
> + else if (i == 9 && RAW_DATA_LENGTH (node) > 20)
> + {
> + pp_string (pp, ", ..., ");
> + i = RAW_DATA_LENGTH (node) - 11;
> + }
> + else
> + pp_string (pp, ", ");
> + }
> + break;
> +
> case FUNCTION_TYPE:
> case METHOD_TYPE:
> dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> --- gcc/tree.cc.jj 2024-07-01 11:28:23.495227837 +0200
> +++ gcc/tree.cc 2024-07-04 14:58:33.563465503 +0200
> @@ -513,6 +513,7 @@ tree_node_structure_for_code (enum tree_
> case STRING_CST: return TS_STRING;
> case VECTOR_CST: return TS_VECTOR;
> case VOID_CST: return TS_TYPED;
> + case RAW_DATA_CST: return TS_RAW_DATA_CST;
>
> /* tcc_exceptional cases. */
> case BLOCK: return TS_BLOCK;
> @@ -571,6 +572,7 @@ initialize_tree_contains_struct (void)
> case TS_FIXED_CST:
> case TS_VECTOR:
> case TS_STRING:
> + case TS_RAW_DATA_CST:
> case TS_COMPLEX:
> case TS_SSA_NAME:
> case TS_CONSTRUCTOR:
> @@ -1026,6 +1028,7 @@ tree_code_size (enum tree_code code)
> case REAL_CST: return sizeof (tree_real_cst);
> case FIXED_CST: return sizeof (tree_fixed_cst);
> case COMPLEX_CST: return sizeof (tree_complex);
> + case RAW_DATA_CST: return sizeof (tree_raw_data);
> case VECTOR_CST: gcc_unreachable ();
> case STRING_CST: gcc_unreachable ();
> default:
> @@ -10467,6 +10470,15 @@ initializer_zerop (const_tree init, bool
> *nonzero = true;
> return false;
>
> + case RAW_DATA_CST:
> + for (unsigned int i = 0; i < (unsigned int) RAW_DATA_LENGTH (init);
> ++i)
> + if (RAW_DATA_POINTER (init)[i])
> + {
> + *nonzero = true;
> + return false;
> + }
> + return true;
> +
> case CONSTRUCTOR:
> {
> if (TREE_CLOBBER_P (init))
> --- gcc/testsuite/c-c++-common/cpp/embed-19.c.jj 2024-07-05
> 11:30:09.333874817 +0200
> +++ gcc/testsuite/c-c++-common/cpp/embed-19.c 2024-07-05
> 11:35:19.825724327 +0200
> @@ -0,0 +1,24 @@
> +/* { dg-do run } */
> +/* { dg-options "" } */
> +/* { dg-additional-options "-std=c23" { target c } } */
> +
> +unsigned char a[] = {
> +#embed __FILE__
> +};
> +struct S { unsigned char h[(sizeof (a) - 7) / 2]; short int i; unsigned char
> j[sizeof (a) - 7 - (sizeof (a) - 7) / 2]; };
> +struct T { int a, b, c; struct S d; long long e; double f; long long g; };
> +struct T b = {
> +#embed __FILE__
> +};
> +
> +int
> +main ()
> +{
> + if (b.a != a[0] || b.b != a[1] || b.c != a[2]
> + || __builtin_memcmp (b.d.h, a + 3, sizeof (b.d.h))
> + || b.d.i != a[3 + sizeof (b.d.h)]
> + || __builtin_memcmp (b.d.j, a + 4 + sizeof (b.d.h), sizeof (b.d.j))
> + || b.e != a[sizeof (a) - 3] || b.f != a[sizeof (a) - 2]
> + || b.g != a[sizeof (a) - 1])
> + __builtin_abort ();
> +}
> --- gcc/testsuite/gcc.dg/cpp/embed-8.c.jj 2024-07-05 13:37:25.289157048
> +0200
> +++ gcc/testsuite/gcc.dg/cpp/embed-8.c 2024-07-05 13:39:15.232694163 +0200
> @@ -0,0 +1,7 @@
> +/* This is a comment with some UTF-8 non-ASCII characters: áéíóú. */
> +/* { dg-do compile } */
> +/* { dg-options "-std=c23 -Wconversion" } */
> +
> +signed char a[] = {
> +#embed __FILE__ /* { dg-warning "conversion from 'int' to 'signed char'
> changes value from '\[12]\[0-9]\[0-9]' to '-\[0-9]\[0-9]*'" } */
> +};
> --- gcc/testsuite/gcc.dg/cpp/embed-7.c.jj 2024-07-05 13:27:28.580097964
> +0200
> +++ gcc/testsuite/gcc.dg/cpp/embed-7.c 2024-07-05 13:36:04.728228965 +0200
> @@ -0,0 +1,39 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c23 -Woverride-init" } */
> +
> +unsigned char a[] = {
> +#embed __FILE__
> +};
> +unsigned char b[] = {
> + [26] =
> +#embed __FILE__
> +};
> +unsigned char c[] = {
> +#embed __FILE__ suffix (,)
> + [sizeof (a) / 4] = 0, /* { dg-warning "initialized field
> overwritten" } */
> + [sizeof (a) / 2] = 1, /* { dg-warning "initialized field
> overwritten" } */
> + [1] = 2, /* { dg-warning "initialized field overwritten" } */
> + [sizeof (a) - 2] = 3 /* { dg-warning "initialized field
> overwritten" } */
> +};
> +unsigned char d[] = {
> + [1] = 4,
> + [26] = 5,
> + [sizeof (a) / 4] = 6,
> + [sizeof (a) / 2] = 7,
> + [sizeof (a) - 2] = 8,
> +#embed __FILE__ prefix ([0] = ) /* { dg-warning "initialized field
> overwritten" } */
> +};
> +unsigned char e[] = {
> +#embed __FILE__ suffix (,)
> + [2] = 9, /* { dg-warning "initialized field overwritten" } */
> + [sizeof (a) - 3] = 10 /* { dg-warning "initialized field
> overwritten" } */
> +};
> +unsigned char f[] = {
> + [23] = 11,
> + [sizeof (a) / 4 - 1] = 12,
> +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 - 1] = ) suffix (,)
> /* { dg-warning "initialized field overwritten" } */
> +#embed __FILE__ limit (130) prefix ([sizeof (a) / 4 - 2] = ) suffix (,)
> /* { dg-warning "initialized field overwritten" } */
> +#embed __FILE__ prefix ([sizeof (a) / 4 + 10] = ) suffix (,) /* {
> dg-warning "initialized field overwritten" } */
> +#embed __FILE__ limit (128) prefix ([sizeof (a) + sizeof (a) / 4 - 30] = )
> suffix (,) /* { dg-warning "initialized field overwritten" } */
> +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 + 96] = ) suffix (,)
> /* { dg-warning "initialized field overwritten" } */
> +};
> --- gcc/testsuite/gcc.dg/cpp/embed-9.c.jj 2024-07-05 13:54:06.976828053
> +0200
> +++ gcc/testsuite/gcc.dg/cpp/embed-9.c 2024-07-05 13:53:54.994987508 +0200
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-std=c23" } */
> +
> +struct __attribute__((designated_init)) S {
> + int a, b, c, d;
> + unsigned char e[128];
> +};
> +
> +struct S s = { .a = 1, .b =
> +#embed __FILE__ limit(128) /* { dg-warning "positional initialization of
> field in 'struct' declared with 'designated_init' attribute" } */
> +}; /* { dg-message "near initialization" "" { target *-*-* }
> .-1 } */
> --- gcc/testsuite/gcc.dg/cpp/embed-6.c.jj 2024-07-05 13:25:08.339965010
> +0200
> +++ gcc/testsuite/gcc.dg/cpp/embed-6.c 2024-07-05 13:24:03.036834399 +0200
> @@ -0,0 +1,82 @@
> +/* { dg-do run } */
> +/* { dg-options "-std=c23" } */
> +
> +unsigned char a[] = {
> +#embed __FILE__
> +};
> +unsigned char b[] = {
> + [26] =
> +#embed __FILE__
> +};
> +unsigned char c[] = {
> +#embed __FILE__ suffix (,)
> + [sizeof (a) / 4] = 0,
> + [sizeof (a) / 2] = 1,
> + [1] = 2,
> + [sizeof (a) - 2] = 3
> +};
> +unsigned char d[] = {
> + [1] = 4,
> + [26] = 5,
> + [sizeof (a) / 4] = 6,
> + [sizeof (a) / 2] = 7,
> + [sizeof (a) - 2] = 8,
> +#embed __FILE__ prefix ([0] = )
> +};
> +unsigned char e[] = {
> +#embed __FILE__ suffix (,)
> + [2] = 9,
> + [sizeof (a) - 3] = 10
> +};
> +unsigned char f[] = {
> + [23] = 11,
> + [sizeof (a) / 4 - 1] = 12,
> +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 - 1] = ) suffix (,)
> +#embed __FILE__ limit (130) prefix ([sizeof (a) / 4 - 2] = ) suffix (,)
> +#embed __FILE__ prefix ([sizeof (a) / 4 + 10] = ) suffix (,)
> +#embed __FILE__ limit (128) prefix ([sizeof (a) + sizeof (a) / 4 - 30] = )
> suffix (,)
> +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 + 96] = ) suffix (,)
> +};
> +unsigned char z[sizeof (a) / 4] = {
> +};
> +
> +int
> +main ()
> +{
> + if (sizeof (b) != sizeof (a) + 26
> + || __builtin_memcmp (a, b + 26, sizeof (a)))
> + __builtin_abort ();
> + if (sizeof (c) != sizeof (a)
> + || a[0] != c[0]
> + || c[1] != 2
> + || __builtin_memcmp (a + 2, c + 2, sizeof (a) / 4 - 2)
> + || c[sizeof (a) / 4] != 0
> + || __builtin_memcmp (a + sizeof (a) / 4 + 1, c + sizeof (a) / 4 + 1,
> sizeof (a) / 2 - sizeof (a) / 4 - 1)
> + || c[sizeof (a) / 2] != 1
> + || __builtin_memcmp (a + sizeof (a) / 2 + 1, c + sizeof (a) / 2 + 1,
> sizeof (a) - sizeof (a) / 2 - 3)
> + || c[sizeof (a) - 2] != 3
> + || a[sizeof (a) - 1] != c[sizeof (a) - 1])
> + __builtin_abort ();
> + if (sizeof (d) != sizeof (a)
> + || __builtin_memcmp (a, d, sizeof (a)))
> + __builtin_abort ();
> + if (sizeof (e) != sizeof (a)
> + || a[0] != e[0]
> + || a[1] != e[1]
> + || e[2] != 9
> + || __builtin_memcmp (a + 3, e + 3, sizeof (a) - 6)
> + || e[sizeof (a) - 3] != 10
> + || a[sizeof (a) - 2] != e[sizeof (a) - 2]
> + || a[sizeof (a) - 1] != e[sizeof (a) - 1])
> + __builtin_abort ();
> + if (sizeof (f) != sizeof (a) + sizeof (a) / 4 - 30 + 128
> + || __builtin_memcmp (z, f, 23)
> + || f[23] != 11
> + || __builtin_memcmp (z, f + 24, sizeof (a) / 4 - 2 - 24)
> + || __builtin_memcmp (f + sizeof (a) / 4 - 2, a, 12)
> + || __builtin_memcmp (f + sizeof (a) / 4 + 10, a, 86)
> + || __builtin_memcmp (f + sizeof (a) / 4 + 96, a, 128)
> + || __builtin_memcmp (f + sizeof (a) / 4 + 96 + 128, a + 86 + 128,
> sizeof (a) - 86 - 128 - 40)
> + || __builtin_memcmp (f + sizeof (a) + sizeof (a) / 4 - 30, a, 128))
> + __builtin_abort ();
> +}
> --- gcc/fold-const.cc.jj 2024-06-05 15:42:28.144707055 +0200
> +++ gcc/fold-const.cc 2024-07-06 10:46:59.487035697 +0200
> @@ -8405,6 +8405,48 @@ native_encode_initializer (tree init, un
> }
>
> curpos = pos;
> + if (val && TREE_CODE (val) == RAW_DATA_CST)
> + {
> + if (count)
> + return 0;
> + if (off == -1
> + || (curpos >= off
> + && (curpos + RAW_DATA_LENGTH (val)
> + <= (HOST_WIDE_INT) off + len)))
> + {
> + if (ptr)
> + memcpy (ptr + (curpos - o), RAW_DATA_POINTER (val),
> + RAW_DATA_LENGTH (val));
> + if (mask)
> + memset (mask + curpos, 0, RAW_DATA_LENGTH (val));
> + }
> + else if (curpos + RAW_DATA_LENGTH (val) > off
> + && curpos < (HOST_WIDE_INT) off + len)
> + {
> + /* Partial overlap. */
> + unsigned char *p = NULL;
> + int no = 0;
> + int l;
> + gcc_assert (mask == NULL);
> + if (curpos >= off)
> + {
> + if (ptr)
> + p = ptr + curpos - off;
> + l = MIN ((HOST_WIDE_INT) off + len - curpos,
> + RAW_DATA_LENGTH (val));
> + }
> + else
> + {
> + p = ptr;
> + no = off - curpos;
> + l = len;
> + }
> + if (p)
> + memcpy (p, RAW_DATA_POINTER (val) + no, l);
> + }
> + curpos += RAW_DATA_LENGTH (val);
> + val = NULL_TREE;
> + }
> if (val)
> do
> {
> @@ -13768,6 +13810,9 @@ get_array_ctor_element_at_index (tree ct
> else
> first_p = false;
>
> + if (TREE_CODE (cval) == RAW_DATA_CST)
> + max_index += RAW_DATA_LENGTH (cval) - 1;
> +
> /* Do we have match? */
> if (wi::cmp (access_index, index, index_sgn) >= 0)
> {
> @@ -13867,10 +13912,26 @@ fold (tree expr)
> && TREE_CODE (op0) == CONSTRUCTOR
> && ! type_contains_placeholder_p (TREE_TYPE (op0)))
> {
> - tree val = get_array_ctor_element_at_index (op0,
> - wi::to_offset (op1));
> + unsigned int idx;
> + tree val
> + = get_array_ctor_element_at_index (op0, wi::to_offset (op1),
> + &idx);
> if (val)
> - return val;
> + {
> + if (TREE_CODE (val) != RAW_DATA_CST)
> + return val;
> + if (CONSTRUCTOR_ELT (op0, idx)->index == NULL_TREE
> + || (TREE_CODE (CONSTRUCTOR_ELT (op0, idx)->index)
> + != INTEGER_CST))
> + return t;
> + offset_int o
> + = (wi::to_offset (op1)
> + - wi::to_offset (CONSTRUCTOR_ELT (op0, idx)->index));
> + gcc_checking_assert (o < RAW_DATA_LENGTH (val));
> + return build_int_cst (TREE_TYPE (val),
> + ((const unsigned char *)
> + RAW_DATA_POINTER (val))[o.to_uhwi ()]);
> + }
> }
>
> return t;
> --- gcc/c/c-parser.cc.jj 2024-07-01 11:28:21.840249061 +0200
> +++ gcc/c/c-parser.cc 2024-07-04 14:58:33.568465437 +0200
> @@ -6212,6 +6212,25 @@ c_parser_braced_init (c_parser *parser,
> {
> last_init_list_comma = c_parser_peek_token (parser)->location;
> c_parser_consume_token (parser);
> + /* CPP_EMBED should be always in between two CPP_COMMA
> + tokens. */
> + while (c_parser_next_token_is (parser, CPP_EMBED))
> + {
> + c_token *embed = c_parser_peek_token (parser);
> + c_parser_consume_token (parser);
> + c_expr embed_val;
> + embed_val.value = embed->value;
> + embed_val.original_code = RAW_DATA_CST;
> + embed_val.original_type = integer_type_node;
> + set_c_expr_source_range (&embed_val, embed->get_range ());
> + embed_val.m_decimal = 0;
> + process_init_element (embed->location, embed_val, false,
> + &braced_init_obstack);
> + gcc_checking_assert (c_parser_next_token_is (parser,
> + CPP_COMMA));
> + last_init_list_comma = c_parser_peek_token (parser)->location;
> + c_parser_consume_token (parser);
> + }
> }
> else
> break;
> --- gcc/c/c-typeck.cc.jj 2024-06-14 19:45:07.455803708 +0200
> +++ gcc/c/c-typeck.cc 2024-07-05 12:48:05.357558694 +0200
> @@ -8747,12 +8747,13 @@ digest_init (location_t init_loc, tree t
> if (!maybe_const)
> arith_const_expr = false;
> else if (!INTEGRAL_TYPE_P (TREE_TYPE (inside_init))
> - && TREE_CODE (TREE_TYPE (inside_init)) != REAL_TYPE
> - && TREE_CODE (TREE_TYPE (inside_init)) != COMPLEX_TYPE)
> + && TREE_CODE (TREE_TYPE (inside_init)) != REAL_TYPE
> + && TREE_CODE (TREE_TYPE (inside_init)) != COMPLEX_TYPE)
> arith_const_expr = false;
> else if (TREE_CODE (inside_init) != INTEGER_CST
> - && TREE_CODE (inside_init) != REAL_CST
> - && TREE_CODE (inside_init) != COMPLEX_CST)
> + && TREE_CODE (inside_init) != REAL_CST
> + && TREE_CODE (inside_init) != COMPLEX_CST
> + && TREE_CODE (inside_init) != RAW_DATA_CST)
> arith_const_expr = false;
> else if (TREE_OVERFLOW (inside_init))
> arith_const_expr = false;
> @@ -9013,6 +9014,22 @@ digest_init (location_t init_loc, tree t
> ? ic_init_const
> : ic_init), null_pointer_constant,
> NULL_TREE, NULL_TREE, 0);
> + if (TREE_CODE (inside_init) == RAW_DATA_CST
> + && c_inhibit_evaluation_warnings == 0
> + && warn_overflow
> + && !TYPE_UNSIGNED (type)
> + && TYPE_PRECISION (type) == CHAR_BIT)
> + for (unsigned int i = 0;
> + i < (unsigned) RAW_DATA_LENGTH (inside_init); ++i)
> + if (((const signed char *) RAW_DATA_POINTER (inside_init))[i] < 0)
> + warning_at (init_loc, OPT_Wconversion,
> + "conversion from %qT to %qT changes value from "
> + "%qd to %qd",
> + integer_type_node, type,
> + ((const unsigned char *)
> + RAW_DATA_POINTER (inside_init))[i],
> + ((const signed char *)
> + RAW_DATA_POINTER (inside_init))[i]);
> return inside_init;
> }
>
> @@ -10124,6 +10141,28 @@ set_init_label (location_t loc, tree fie
> while (field != NULL_TREE);
> }
>
> +/* Helper function for add_pending_init. Find inorder successor of P
> + in AVL tree. */
> +static struct init_node *
> +init_node_successor (struct init_node *p)
> +{
> + struct init_node *r;
> + if (p->right)
> + {
> + r = p->right;
> + while (r->left)
> + r = r->left;
> + return r;
> + }
> + r = p->parent;
> + while (r && p == r->right)
> + {
> + p = r;
> + r = r->parent;
> + }
> + return r;
> +}
> +
> /* Add a new initializer to the tree of pending initializers. PURPOSE
> identifies the initializer, either array index or field in a structure.
> VALUE is the value of that index or field. If ORIGTYPE is not
> @@ -10151,9 +10190,179 @@ add_pending_init (location_t loc, tree p
> if (tree_int_cst_lt (purpose, p->purpose))
> q = &p->left;
> else if (tree_int_cst_lt (p->purpose, purpose))
> - q = &p->right;
> + {
> + if (TREE_CODE (p->value) != RAW_DATA_CST
> + || (p->right
> + && tree_int_cst_le (p->right->purpose, purpose)))
> + q = &p->right;
> + else
> + {
> + widest_int pp = wi::to_widest (p->purpose);
> + widest_int pw = wi::to_widest (purpose);
> + if (pp + RAW_DATA_LENGTH (p->value) <= pw)
> + q = &p->right;
> + else
> + {
> + /* Override which should split the old RAW_DATA_CST
> + into 2 or 3 pieces. */
> + if (!implicit && warn_override_init)
> + warning_init (loc, OPT_Woverride_init,
> + "initialized field overwritten");
> + unsigned HOST_WIDE_INT start = (pw - pp).to_uhwi ();
> + unsigned HOST_WIDE_INT len = 1;
> + if (TREE_CODE (value) == RAW_DATA_CST)
> + len = RAW_DATA_LENGTH (value);
> + unsigned HOST_WIDE_INT end = 0;
> + unsigned plen = RAW_DATA_LENGTH (p->value);
> + gcc_checking_assert (start < plen && start);
> + if (plen - start > len)
> + end = plen - start - len;
> + tree v = p->value;
> + tree origtype = p->origtype;
> + if (start == 1)
> + p->value = build_int_cst (TREE_TYPE (v),
> + *(const unsigned char *)
> + RAW_DATA_POINTER (v));
> + else
> + {
> + p->value = v;
> + if (end > 1)
> + v = copy_node (v);
> + RAW_DATA_LENGTH (p->value) = start;
> + }
> + if (end)
> + {
> + tree epurpose
> + = size_binop (PLUS_EXPR, purpose,
> + bitsize_int (len));
> + if (end > 1)
> + {
> + RAW_DATA_LENGTH (v) -= plen - end;
> + RAW_DATA_POINTER (v) += plen - end;
> + }
> + else
> + v = build_int_cst (TREE_TYPE (v),
> + ((const unsigned char *)
> + RAW_DATA_POINTER (v))[plen
> + - end]);
> + add_pending_init (loc, epurpose, v, origtype,
> + implicit, braced_init_obstack);
> + }
> + q = &constructor_pending_elts;
> + continue;
> + }
> + }
> + }
> else
> {
> + if (TREE_CODE (p->value) == RAW_DATA_CST
> + && (RAW_DATA_LENGTH (p->value)
> + > (TREE_CODE (value) == RAW_DATA_CST
> + ? RAW_DATA_LENGTH (value) : 1)))
> + {
> + /* Override which should split the old RAW_DATA_CST
> + into 2 pieces. */
> + if (!implicit && warn_override_init)
> + warning_init (loc, OPT_Woverride_init,
> + "initialized field overwritten");
> + unsigned HOST_WIDE_INT len = 1;
> + if (TREE_CODE (value) == RAW_DATA_CST)
> + len = RAW_DATA_LENGTH (value);
> + if ((unsigned) RAW_DATA_LENGTH (p->value) > len + 1)
> + {
> + RAW_DATA_LENGTH (p->value) -= len;
> + RAW_DATA_POINTER (p->value) += len;
> + }
> + else
> + {
> + unsigned int l = RAW_DATA_LENGTH (p->value) - 1;
> + p->value
> + = build_int_cst (TREE_TYPE (p->value),
> + ((const unsigned char *)
> + RAW_DATA_POINTER (p->value))[l]);
> + }
> + p->purpose = size_binop (PLUS_EXPR, p->purpose,
> + bitsize_int (len));
> + continue;
> + }
> + if (TREE_CODE (value) == RAW_DATA_CST)
> + {
> + handle_raw_data:
> + /* RAW_DATA_CST value might overlap various further
> + prior initval entries. Find out how many. */
> + unsigned cnt = 0;
> + widest_int w
> + = wi::to_widest (purpose) + RAW_DATA_LENGTH (value);
> + struct init_node *r = p, *last = NULL;
> + bool override_init = warn_override_init;
> + while ((r = init_node_successor (r))
> + && wi::to_widest (r->purpose) < w)
> + {
> + ++cnt;
> + if (TREE_SIDE_EFFECTS (r->value))
> + warning_init (loc, OPT_Woverride_init_side_effects,
> + "initialized field with side-effects "
> + "overwritten");
> + else if (override_init)
> + {
> + warning_init (loc, OPT_Woverride_init,
> + "initialized field overwritten");
> + override_init = false;
> + }
> + last = r;
> + }
> + if (cnt)
> + {
> + if (TREE_CODE (last->value) == RAW_DATA_CST
> + && (wi::to_widest (last->purpose)
> + + RAW_DATA_LENGTH (last->value) > w))
> + {
> + /* The last overlapping prior initval overlaps
> + only partially. Shrink it and decrease cnt. */
> + unsigned int l = (wi::to_widest (last->purpose)
> + + RAW_DATA_LENGTH (last->value)
> + - w).to_uhwi ();
> + --cnt;
> + RAW_DATA_LENGTH (last->value) -= l;
> + RAW_DATA_POINTER (last->value) += l;
> + if (RAW_DATA_LENGTH (last->value) == 1)
> + {
> + const unsigned char *s
> + = ((const unsigned char *)
> + RAW_DATA_POINTER (last->value));
> + last->value
> + = build_int_cst (TREE_TYPE (last->value), *s);
> + }
> + last->purpose
> + = size_binop (PLUS_EXPR, last->purpose,
> + bitsize_int (l));
> + }
> + /* Instead of deleting cnt nodes from the AVL tree
> + and rebalancing, peel of last cnt bytes from the
> + RAW_DATA_CST. Overriding thousands of previously
> + initialized array elements with #embed needs to work,
> + but doesn't need to be super efficient. */
> + gcc_checking_assert ((unsigned) RAW_DATA_LENGTH (value)
> + > cnt);
> + RAW_DATA_LENGTH (value) -= cnt;
> + const unsigned char *s
> + = ((const unsigned char *) RAW_DATA_POINTER (value)
> + + RAW_DATA_LENGTH (value));
> + unsigned int o = RAW_DATA_LENGTH (value);
> + for (r = p; cnt--; ++o, ++s)
> + {
> + r = init_node_successor (r);
> + r->purpose = size_binop (PLUS_EXPR, purpose,
> + bitsize_int (o));
> + r->value = build_int_cst (TREE_TYPE (value), *s);
> + r->origtype = origtype;
> + }
> + if (RAW_DATA_LENGTH (value) == 1)
> + value = build_int_cst (TREE_TYPE (value),
> + *((const unsigned char *)
> + RAW_DATA_POINTER (value)));
> + }
> + }
> if (!implicit)
> {
> if (TREE_SIDE_EFFECTS (p->value))
> @@ -10169,6 +10378,23 @@ add_pending_init (location_t loc, tree p
> return;
> }
> }
> + if (TREE_CODE (value) == RAW_DATA_CST && p)
> + {
> + struct init_node *r;
> + if (q == &p->left)
> + r = p;
> + else
> + r = init_node_successor (p);
> + if (r && wi::to_widest (r->purpose) < (wi::to_widest (purpose)
> + + RAW_DATA_LENGTH (value)))
> + {
> + /* Overlap with at least one prior initval in the range but
> + not at the start. */
> + p = r;
> + p->purpose = purpose;
> + goto handle_raw_data;
> + }
> + }
> }
> else
> {
> @@ -10397,8 +10623,8 @@ set_nonincremental_init (struct obstack
> {
> if (TYPE_DOMAIN (constructor_type))
> constructor_unfilled_index
> - = convert (bitsizetype,
> - TYPE_MIN_VALUE (TYPE_DOMAIN (constructor_type)));
> + = convert (bitsizetype,
> + TYPE_MIN_VALUE (TYPE_DOMAIN (constructor_type)));
> else
> constructor_unfilled_index = bitsize_zero_node;
> }
> @@ -10612,12 +10838,13 @@ output_init_element (location_t loc, tre
> if (!maybe_const)
> arith_const_expr = false;
> else if (!INTEGRAL_TYPE_P (TREE_TYPE (value))
> - && TREE_CODE (TREE_TYPE (value)) != REAL_TYPE
> - && TREE_CODE (TREE_TYPE (value)) != COMPLEX_TYPE)
> + && TREE_CODE (TREE_TYPE (value)) != REAL_TYPE
> + && TREE_CODE (TREE_TYPE (value)) != COMPLEX_TYPE)
> arith_const_expr = false;
> else if (TREE_CODE (value) != INTEGER_CST
> - && TREE_CODE (value) != REAL_CST
> - && TREE_CODE (value) != COMPLEX_CST)
> + && TREE_CODE (value) != REAL_CST
> + && TREE_CODE (value) != COMPLEX_CST
> + && TREE_CODE (value) != RAW_DATA_CST)
> arith_const_expr = false;
> else if (TREE_OVERFLOW (value))
> arith_const_expr = false;
> @@ -10784,9 +11011,14 @@ output_init_element (location_t loc, tre
>
> /* Advance the variable that indicates sequential elements output. */
> if (TREE_CODE (constructor_type) == ARRAY_TYPE)
> - constructor_unfilled_index
> - = size_binop_loc (input_location, PLUS_EXPR,
> constructor_unfilled_index,
> - bitsize_one_node);
> + {
> + tree inc = bitsize_one_node;
> + if (value && TREE_CODE (value) == RAW_DATA_CST)
> + inc = bitsize_int (RAW_DATA_LENGTH (value));
> + constructor_unfilled_index
> + = size_binop_loc (input_location, PLUS_EXPR,
> + constructor_unfilled_index, inc);
> + }
> else if (TREE_CODE (constructor_type) == RECORD_TYPE)
> {
> constructor_unfilled_fields
> @@ -10795,8 +11027,8 @@ output_init_element (location_t loc, tre
> /* Skip any nameless bit fields. */
> while (constructor_unfilled_fields != NULL_TREE
> && DECL_UNNAMED_BIT_FIELD (constructor_unfilled_fields))
> - constructor_unfilled_fields =
> - DECL_CHAIN (constructor_unfilled_fields);
> + constructor_unfilled_fields
> + = DECL_CHAIN (constructor_unfilled_fields);
> }
> else if (TREE_CODE (constructor_type) == UNION_TYPE)
> constructor_unfilled_fields = NULL_TREE;
> @@ -11042,6 +11274,23 @@ initialize_elementwise_p (tree type, tre
> return false;
> }
>
> +/* Helper function for process_init_element. Split first element of
> + RAW_DATA_CST and save the rest to *RAW_DATA. */
> +
> +static inline tree
> +maybe_split_raw_data (tree value, tree *raw_data)
> +{
> + if (value == NULL_TREE || TREE_CODE (value) != RAW_DATA_CST)
> + return value;
> + *raw_data = value;
> + value = build_int_cst (integer_type_node,
> + *(const unsigned char *)
> + RAW_DATA_POINTER (*raw_data));
> + ++RAW_DATA_POINTER (*raw_data);
> + --RAW_DATA_LENGTH (*raw_data);
> + return value;
> +}
> +
> /* Add one non-braced element to the current constructor level.
> This adjusts the current position within the constructor's type.
> This may also start or terminate implicit levels
> @@ -11064,7 +11313,9 @@ process_init_element (location_t loc, st
> = (orig_value != NULL_TREE && TREE_CODE (orig_value) == STRING_CST);
> bool strict_string = value.original_code == STRING_CST;
> bool was_designated = designator_depth != 0;
> + tree raw_data = NULL_TREE;
>
> +retry:
> designator_depth = 0;
> designator_erroneous = 0;
>
> @@ -11232,6 +11483,7 @@ process_init_element (location_t loc, st
> continue;
> }
>
> + value.value = maybe_split_raw_data (value.value, &raw_data);
> if (value.value)
> {
> push_member_name (constructor_fields);
> @@ -11320,6 +11572,7 @@ process_init_element (location_t loc, st
> continue;
> }
>
> + value.value = maybe_split_raw_data (value.value, &raw_data);
> if (value.value)
> {
> push_member_name (constructor_fields);
> @@ -11368,26 +11621,66 @@ process_init_element (location_t loc, st
> break;
> }
>
> - /* Now output the actual element. */
> - if (value.value)
> + if (value.value
> + && TREE_CODE (value.value) == RAW_DATA_CST
> + && RAW_DATA_LENGTH (value.value) > 1
> + && (TREE_CODE (elttype) == INTEGER_TYPE
> + || TREE_CODE (elttype) == BITINT_TYPE)
> + && TYPE_PRECISION (elttype) == CHAR_BIT
> + && (constructor_max_index == NULL_TREE
> + || tree_int_cst_lt (constructor_index,
> + constructor_max_index)))
> {
> + unsigned int len = RAW_DATA_LENGTH (value.value);
> + if (constructor_max_index)
> + {
> + widest_int w = wi::to_widest (constructor_max_index);
> + w -= wi::to_widest (constructor_index);
> + w += 1;
> + if (w < len)
> + len = w.to_uhwi ();
> + }
> + if (len < (unsigned) RAW_DATA_LENGTH (value.value))
> + {
> + raw_data = copy_node (value.value);
> + RAW_DATA_LENGTH (raw_data) -= len;
> + RAW_DATA_POINTER (raw_data) += len;
> + RAW_DATA_LENGTH (value.value) = len;
> + }
> + TREE_TYPE (value.value) = elttype;
> push_array_bounds (tree_to_uhwi (constructor_index));
> output_init_element (loc, value.value, value.original_type,
> - strict_string, elttype,
> - constructor_index, true, implicit,
> - braced_init_obstack);
> + false, elttype, constructor_index, true,
> + implicit, braced_init_obstack);
> RESTORE_SPELLING_DEPTH (constructor_depth);
> + constructor_index
> + = size_binop_loc (input_location, PLUS_EXPR,
> + constructor_index, bitsize_int (len));
> }
> + else
> + {
> + value.value = maybe_split_raw_data (value.value, &raw_data);
> + /* Now output the actual element. */
> + if (value.value)
> + {
> + push_array_bounds (tree_to_uhwi (constructor_index));
> + output_init_element (loc, value.value, value.original_type,
> + strict_string, elttype,
> + constructor_index, true, implicit,
> + braced_init_obstack);
> + RESTORE_SPELLING_DEPTH (constructor_depth);
> + }
>
> - constructor_index
> - = size_binop_loc (input_location, PLUS_EXPR,
> - constructor_index, bitsize_one_node);
> -
> - if (!value.value)
> - /* If we are doing the bookkeeping for an element that was
> - directly output as a constructor, we must update
> - constructor_unfilled_index. */
> - constructor_unfilled_index = constructor_index;
> + constructor_index
> + = size_binop_loc (input_location, PLUS_EXPR,
> + constructor_index, bitsize_one_node);
> +
> + if (!value.value)
> + /* If we are doing the bookkeeping for an element that was
> + directly output as a constructor, we must update
> + constructor_unfilled_index. */
> + constructor_unfilled_index = constructor_index;
> + }
> }
> else if (gnu_vector_type_p (constructor_type))
> {
> @@ -11402,6 +11695,7 @@ process_init_element (location_t loc, st
> break;
> }
>
> + value.value = maybe_split_raw_data (value.value, &raw_data);
> /* Now output the actual element. */
> if (value.value)
> {
> @@ -11435,6 +11729,7 @@ process_init_element (location_t loc, st
> }
> else
> {
> + value.value = maybe_split_raw_data (value.value, &raw_data);
> if (value.value)
> output_init_element (loc, value.value, value.original_type,
> strict_string, constructor_type,
> @@ -11506,6 +11801,14 @@ process_init_element (location_t loc, st
> }
>
> constructor_range_stack = 0;
> +
> + if (raw_data && RAW_DATA_LENGTH (raw_data))
> + {
> + gcc_assert (!string_flag && !was_designated);
> + value.value = raw_data;
> + raw_data = NULL_TREE;
> + goto retry;
> + }
> }
>
> /* Build a complete asm-statement, whose components are a CV_QUALIFIER
> --- gcc/tree.def.jj 2024-06-05 19:09:54.045617019 +0200
> +++ gcc/tree.def 2024-07-05 10:10:48.372613006 +0200
> @@ -309,6 +309,12 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
> /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
> DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> +/* Contents are RAW_DATA_LENGTH and the actual content
> + of the raw data, plus RAW_DATA_OWNER if non-NULL for owner of the
> + data (e.g. STRING_CST), if it is NULL, the data is owned by libcpp.
> + TREE_TYPE is the type of each of the RAW_DATA_LENGTH elements. */
> +DEFTREECODE (RAW_DATA_CST, "raw_data_cst", tcc_constant, 0)
> +
> /* Declarations. All references to names are represented as ..._DECL
> nodes. The decls in one binding context are chained through the
> TREE_CHAIN field. Each DECL has a DECL_NAME field which contains
> --- gcc/c-family/c-lex.cc.jj 2024-02-22 19:29:51.226074838 +0100
> +++ gcc/c-family/c-lex.cc 2024-07-04 14:58:33.568465437 +0200
> @@ -781,6 +781,13 @@ c_lex_with_flags (tree *value, location_
> *value = build_string (tok->val.str.len, (const char
> *)tok->val.str.text);
> break;
>
> + case CPP_EMBED:
> + *value = make_node (RAW_DATA_CST);
> + TREE_TYPE (*value) = integer_type_node;
> + RAW_DATA_LENGTH (*value) = tok->val.str.len;
> + RAW_DATA_POINTER (*value) = (const char *) tok->val.str.text;
> + break;
> +
> /* This token should not be visible outside cpplib. */
> case CPP_MACRO_ARG:
> gcc_unreachable ();
> @@ -800,7 +807,7 @@ c_lex_with_flags (tree *value, location_
> add_flags |= PREV_FALLTHROUGH;
> goto retry_after_at;
> }
> - goto retry;
> + goto retry;
>
> default:
> *value = NULL_TREE;
> --- gcc/tree-core.h.jj 2024-07-01 11:28:23.408228952 +0200
> +++ gcc/tree-core.h 2024-07-03 19:41:28.821880055 +0200
> @@ -1516,6 +1516,13 @@ struct GTY(()) tree_string {
> char str[1];
> };
>
> +struct GTY(()) tree_raw_data {
> + struct tree_typed typed;
> + tree owner;
> + const char *GTY ((skip(""))) str;
> + int length;
> +};
> +
> struct GTY(()) tree_complex {
> struct tree_typed typed;
> tree real;
> @@ -2106,6 +2113,7 @@ union GTY ((ptr_alias (union lang_tree_n
> struct tree_fixed_cst GTY ((tag ("TS_FIXED_CST"))) fixed_cst;
> struct tree_vector GTY ((tag ("TS_VECTOR"))) vector;
> struct tree_string GTY ((tag ("TS_STRING"))) string;
> + struct tree_raw_data GTY ((tag ("TS_RAW_DATA_CST"))) raw_data_cst;
> struct tree_complex GTY ((tag ("TS_COMPLEX"))) complex;
> struct tree_identifier GTY ((tag ("TS_IDENTIFIER"))) identifier;
> struct tree_decl_minimal GTY((tag ("TS_DECL_MINIMAL"))) decl_minimal;
> --- gcc/gimple-fold.cc.jj 2024-06-05 15:42:28.178706605 +0200
> +++ gcc/gimple-fold.cc 2024-07-06 11:14:41.202981865 +0200
> @@ -8000,7 +8000,7 @@ fold_array_ctor_reference (tree type, tr
> unsigned ctor_idx;
> tree val = get_array_ctor_element_at_index (ctor, access_index,
> &ctor_idx);
> - if (!val && ctor_idx >= CONSTRUCTOR_NELTS (ctor))
> + if (!val && ctor_idx >= CONSTRUCTOR_NELTS (ctor))
> return build_zero_cst (type);
>
> /* native-encode adjacent ctor elements. */
> @@ -8027,10 +8027,27 @@ fold_array_ctor_reference (tree type, tr
> {
> if (bufoff + elt_sz > sizeof (buf))
> elt_sz = sizeof (buf) - bufoff;
> - int len = native_encode_expr (val, buf + bufoff, elt_sz,
> + int len;
> + if (TREE_CODE (val) == RAW_DATA_CST)
> + {
> + gcc_assert (inner_offset == 0);
> + if (!elt->index || TREE_CODE (elt->index) != INTEGER_CST)
> + return NULL_TREE;
> + inner_offset = (access_index
> + - wi::to_offset (elt->index)).to_uhwi ();
> + len = MIN (sizeof (buf) - bufoff,
> + (unsigned) (RAW_DATA_LENGTH (val) - inner_offset));
> + memcpy (buf + bufoff, RAW_DATA_POINTER (val) + inner_offset,
> + len);
> + access_index += len - 1;
> + }
> + else
> + {
> + len = native_encode_expr (val, buf + bufoff, elt_sz,
> inner_offset / BITS_PER_UNIT);
> - if (len != (int) elt_sz - inner_offset / BITS_PER_UNIT)
> - return NULL_TREE;
> + if (len != (int) elt_sz - inner_offset / BITS_PER_UNIT)
> + return NULL_TREE;
> + }
> inner_offset = 0;
> bufoff += len;
>
> @@ -8072,8 +8089,23 @@ fold_array_ctor_reference (tree type, tr
> return native_interpret_expr (type, buf, size / BITS_PER_UNIT);
> }
>
> - if (tree val = get_array_ctor_element_at_index (ctor, access_index))
> + unsigned ctor_idx;
> + if (tree val = get_array_ctor_element_at_index (ctor, access_index,
> + &ctor_idx))
> {
> + if (TREE_CODE (val) == RAW_DATA_CST)
> + {
> + if (size != BITS_PER_UNIT || elt_sz != 1 || inner_offset != 0)
> + return NULL_TREE;
> + constructor_elt *elt = CONSTRUCTOR_ELT (ctor, ctor_idx);
> + if (elt->index == NULL_TREE || TREE_CODE (elt->index) != INTEGER_CST)
> + return NULL_TREE;
> + *suboff += access_index.to_uhwi () * BITS_PER_UNIT;
> + unsigned o = (access_index - wi::to_offset (elt->index)).to_uhwi ();
> + return build_int_cst (TREE_TYPE (val),
> + ((const unsigned char *)
> + RAW_DATA_POINTER (val))[o]);
> + }
> if (!size && TREE_CODE (val) != CONSTRUCTOR)
> {
> /* For the final reference to the entire accessed element
> --- gcc/treestruct.def.jj 2024-01-03 11:51:38.761630845 +0100
> +++ gcc/treestruct.def 2024-07-03 17:06:57.539794162 +0200
> @@ -39,6 +39,7 @@ DEFTREESTRUCT(TS_REAL_CST, "real cst")
> DEFTREESTRUCT(TS_FIXED_CST, "fixed cst")
> DEFTREESTRUCT(TS_VECTOR, "vector")
> DEFTREESTRUCT(TS_STRING, "string")
> +DEFTREESTRUCT(TS_RAW_DATA_CST, "raw data cst")
> DEFTREESTRUCT(TS_COMPLEX, "complex")
> DEFTREESTRUCT(TS_IDENTIFIER, "identifier")
> DEFTREESTRUCT(TS_DECL_MINIMAL, "decl minimal")
>
>
> Jakub
>