On Sat, Jul 06, 2024 at 02:45:45PM +0200, Richard Biener wrote: > > Anyway, thoughts on this before I spend too much time on it? > > Why do we have an "element type"? Would > > int a[] = { > #embed "cc1plus" > }; > > be valid?
Yes, that is valid. The way #embed is defined for C is that it is essentially just as if a huge sequence of integer literals like 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0 so it can appear anywhere in the IL where the grammar allows something like that. So even void foo (...); void bar () { foo ( #embed "cc1plus" ); int i = 1 + ( #embed "cc1plus" ) + 2; } etc. is valid. I chose to greatly simplify things by not emitting CPP_EMBED for the boundary numbers of the sequence because otherwise one needs to deal with significantly more special cases, one can have const unsigned char a[] = { 13 + 25 * #embed "cc1plus" / 2, 0 }; for example, or even something expected to be used in C often like const unsigned char b[] = { [64] = #embed "cc1plus" }; and the advantage of the inner sequence elements is we know for sure it is preceded by CPP_COMMA and succeeded by it too. If we e.g. used CPP_EMBED even for single element sequence, that can appear anywhere where a CPP_NUMBER can appear in the grammar, which is basically everywhere. Right now the patch when lexing CPP_EMBED turns it into a RAW_DATA_CST with integer_type_node type, that reflects that it is from the preprocessor a sequence of int literals, and then when parsing an initializer peels off bytes into it, see e.g. the c-c++-common/cpp/embed-19.c test in the patch where some of the sequence elements initialize some fields in a struct, others an unsigned char array field and others some other fields again. To simplify things it only keeps around the RAW_DATA_CST in the initializer of ARRAY_TYPE CONSTRUCTORs if they have INTEGER_TYPE elements with CHAR_BIT precision, so int a[] = { #embed "cc1plus" }; is peeled off into a huge sequence of INTEGER_CST CONSTRUCTOR_ELTs. In theory if this is something that appears often enough in real-world code we might use RAW_DATA_CST even for that case, basically allocate 4 times as big backing STRING_CST and based on target endianity and storage reverse extend it from one buffer to another one. I'd prefer to do that only if we really see people actually want that, because it will be more work. > I suppose #embed itself is just "embedding" the target(?) > representation and the file encodes that in bytes as if laid out in > memory? It is designed pretty much as the values you get by fread into unsigned char array. > Does anything in the #embed spec require actually reading the contents > of the embedded file? For the above a fstat() would be enough to > deduce the size of a[]. For regular files, fstat would be good enough, for non-regular files one really has to read them into memory. But I think there are so many cases where we actually need to read and inspect the values at compile time I think having it always in memory (as implemented in the patch set) doesn't hurt. E.g. for the first and last byte of the sequence we need to read those, any time one e.g. during constant expression evaluation does something like: constexpr unsigned char a[] = { #embed "cc1plus" }; constexpr int b = a[6832]; etc. we really need to read the value and interpret; similarly during optimizations we often do that as well. ICF hashes the data to decide what is the same, ... Sure, having it all in memory will mean > 2GB embeds in 32-bit compilers will be tough, but in 64-bit compilers should work just fine, while I think e.g. right now you can't have an initialized > 4GB array without gaps because CONSTRUCTOR_ELTS is a vector and that uses unsigned int length. What I think is important that we if at all possible keep it in memory once and refer to the libcpp buffer holding the file, don't copy stuff over and over, that is one of the reasons why compiling that #embed "cc1plus" right now without the optimizations (i.e. as the 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0 261M sequence) just eats more than 26GBs and 5 minutes (stopped it after that). E.g. STRING_CST is inappropriate because it owns the data (data sits in its payload) and currently is only valid as the whole initializer of the array, not just part of it. > When preprocessing only I suppose #embed > isn't "resolved", right? The series as posted will with -E preprocess it into something like 118, # 10 "embed-10.c" #embed "." __gnu__::__base64__( \ "b2lkCmZvbyAodW5zaWduZWQgY2hhciAqcCkKewp9CgppbnQKbWFpbiAoKQp7CiAgdW5zaWduZWQg" \ "Y2hhciBhW10gPSB7CiAgICAjZW1iZWQgX19GSUxFX18KICB9OwogIGZvbyAoYSk7Cn0=") # 10 "embed-10.c" ,10 (so that it is pedantically valid but can be decoded back cheaply). Other option is to emit that 127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253,...,0 but then we don't handle well megabytes of data and gigabytes of them are out of question, or keep the original #embed in there (that is what clang does with some new -dE option), but that isn't really preprocessing, because one has to copy the preprocessed file and all the embed files as well). > I would say we should by default just record a reference to the file > on disk, so RAW_DATA_CST should have a pointer to the backing store > and actual reading of the data should be done on-demand only > (like if required by constexpr or if we desire to constant fold). > IIRC gas supports embedding data as well. Indeed, gas has .incbin directive, but I'd say we could use it only if we know it is a regular file and it will be immediately assembled. If one does -save-temps or -S, I think we'd better make the assembler self-contained. The RAW_DATA_CST in the patch has a tree owner (meant e.g. for the PCH case where we need to copy the data into a STRING_CST or something similar that owns the data and is PCH saved/restored), so perhaps the original owner could be some new tree that has the needed libcpp details in it (filename for regular file and start and end of the libcpp buffer + offset), so one could reconstruct the .incbin from it if the driver tells us we aren't saving the assembler file for later use. For the cases where .incbin can't be used, currently GCC emits: .string "ELF\002\001\001\003" .string "" .string "" .string "" .string "" .string "" .string "" .string "" .string "\002" .string ">" .string "\001" (and that is what clang with #embed also emits). Might be worth to investigate if gas couldn't introduce some new directive to make binary data generally more compact (e.g. if base64 encoding couldn't be beneficial). Because .string "(\035\214\034\347_u\244\rz|~\002\253h\267\271\203v\244\266\372\001\353\363\026\346\365\305\211\005\220\372\215h\267\211{\022\257\277'\0256\215G\2013c.~\244\206\360\226|_\226\223\034\177j\232u\300,\003\3273kh\267q\221\302\326\3153\3772\202,\003\327\346\207\3662giJ3\202,\003\327\305\271\234@%v~\2446-\034\257\310\207\302\326=\256h\267\016\237h\267Q\201\023\257\016\313\302\326q\032\\*\205(u\244\237\023t\244\344Vt\244\247\335\243k\007\256\302\326,th\267}\221h\267\317O\034\257\377\373v\244\227\202a\221$\236\3772\263\326X\221\215Mz\244\216\227\034\257F\213\302\326G\316\302\326\033\277\302\326\177\220h\267\023\263\302\326X\236v\244\034Zt\244\003>\177[\0135\022\257\226ph\267|\377\3033Ox\022\257\214\307\340`\356\235\3772M>\245\013\321*\003\327=\377\3033" etc. isn't compact at all, that is for many bytes 4 characters per byte. base64 would be 4 characters per 3 bytes no matter what the value is. > When reading in data we might want to support reading only required > pieces and possibly have the data compressed. > > For LTO I fear we need to embed the actual data as we cannot be sure > the referenced file can be resolved at LTO link time? Yes, I think like the PCH header case, we need to add a backing STRING_CST for it or something similar (and then not use .incbin at all). > That is, it would be really nice if we can avoid reading in embedded > files and leave that to the assembler. I really think reading isn't that big problem, the problem is too many walks of it, copying the data over and over and especially spending hundreds of compiler memory bytes per each byte in it. Anyway, here is an updated patch where I implemented the native_encode_initializer/fold/fold_ctor_reference reads from the RAW_DATA_CST data (not with sufficient test coverage for that for now). If we keep the RAW_DATA_CST, there is one question, currently in the patch it is simply CONSTRUCTOR with say [0] = 127, [1] = RAW_DATA_CST, [100000] = 0 or similar, the RAW_DATA_LENGTH of the ctor value implies how many elements the sequence has. Another option would be to use RANGE_EXPR for that, so [0] = 127, [1 ... 99999] = RAW_DATA_CST, [100000] = 0 Neither is clean, because generally RANGE_EXPR means the same value is repeated many times, while we want a range filled with subsequent bytes from the raw data memory. So, RAW_DATA_CST is an exceptional thing in either case and in that case avoiding the RANGE_EXPR looked simpler. Unless we want to introduce some RANGE_EXPR variant that goes with RAW_DATA_CST. --- libcpp/files.cc.jj 2024-07-03 14:52:12.231817485 +0200 +++ libcpp/files.cc 2024-07-03 15:44:39.248913032 +0200 @@ -1241,7 +1241,10 @@ finish_embed (cpp_reader *pfile, _cpp_fi limit = params->limit; size_t embed_tokens = 0; - if (CPP_OPTION (pfile, directives_only) && limit >= 64) + if ((CPP_OPTION (pfile, directives_only) + || !CPP_OPTION (pfile, cplusplus)) + && CPP_OPTION (pfile, lang) != CLK_ASM + && limit >= 64) embed_tokens = ((limit - 2) / INT_MAX) + (((limit - 2) % INT_MAX) != 0); size_t max = INTTYPE_MAXIMUM (size_t) / sizeof (cpp_token); --- gcc/varasm.cc.jj 2024-05-07 18:10:10.674871087 +0200 +++ gcc/varasm.cc 2024-07-04 14:58:33.570465411 +0200 @@ -4875,6 +4875,7 @@ initializer_constant_valid_p_1 (tree val case FIXED_CST: case STRING_CST: case COMPLEX_CST: + case RAW_DATA_CST: return null_pointer_node; case ADDR_EXPR: @@ -5468,6 +5469,9 @@ array_size_for_constructor (tree val) { if (TREE_CODE (index) == RANGE_EXPR) index = TREE_OPERAND (index, 1); + if (value && TREE_CODE (value) == RAW_DATA_CST) + index = size_binop (PLUS_EXPR, index, + size_int (RAW_DATA_LENGTH (value) - 1)); if (max_index == NULL_TREE || tree_int_cst_lt (max_index, index)) max_index = index; } @@ -5659,6 +5663,12 @@ output_constructor_regular_field (oc_loc /* Output the element's initial value. */ if (local->val == NULL_TREE) assemble_zeros (fieldsize); + else if (local->val && TREE_CODE (local->val) == RAW_DATA_CST) + { + fieldsize *= RAW_DATA_LENGTH (local->val); + assemble_string (RAW_DATA_POINTER (local->val), + RAW_DATA_LENGTH (local->val)); + } else fieldsize = output_constant (local->val, fieldsize, align2, local->reverse, false); --- gcc/tree.h.jj 2024-06-05 19:09:54.046617006 +0200 +++ gcc/tree.h 2024-07-03 19:41:04.453201043 +0200 @@ -1165,6 +1165,14 @@ extern void omp_clause_range_check_faile #define TREE_STRING_POINTER(NODE) \ ((const char *)(STRING_CST_CHECK (NODE)->string.str)) +/* In a RAW_DATA_CST */ +#define RAW_DATA_LENGTH(NODE) \ + (RAW_DATA_CST_CHECK (NODE)->raw_data_cst.length) +#define RAW_DATA_POINTER(NODE) \ + (RAW_DATA_CST_CHECK (NODE)->raw_data_cst.str) +#define RAW_DATA_OWNER(NODE) \ + (RAW_DATA_CST_CHECK (NODE)->raw_data_cst.owner) + /* In a COMPLEX_CST node. */ #define TREE_REALPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.real) #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag) --- gcc/expr.cc.jj 2024-07-01 11:28:22.704237981 +0200 +++ gcc/expr.cc 2024-07-05 17:05:52.929836616 +0200 @@ -7144,6 +7144,12 @@ categorize_ctor_elements_1 (const_tree c init_elts += mult * TREE_STRING_LENGTH (value); break; + case RAW_DATA_CST: + nz_elts += mult * RAW_DATA_LENGTH (value); + unique_nz_elts += RAW_DATA_LENGTH (value); + init_elts += mult * RAW_DATA_LENGTH (value); + break; + case COMPLEX_CST: if (!initializer_zerop (TREE_REALPART (value))) { @@ -11788,7 +11794,8 @@ expand_expr_real_1 (tree exp, rtx target field, value) if (tree_int_cst_equal (field, index)) { - if (!TREE_SIDE_EFFECTS (value)) + if (!TREE_SIDE_EFFECTS (value) + && TREE_CODE (value) != RAW_DATA_CST) return expand_expr (fold (value), target, tmode, modifier); break; } @@ -11830,7 +11837,8 @@ expand_expr_real_1 (tree exp, rtx target field, value) if (tree_int_cst_equal (field, index)) { - if (TREE_SIDE_EFFECTS (value)) + if (TREE_SIDE_EFFECTS (value) + || TREE_CODE (value) == RAW_DATA_CST) break; if (TREE_CODE (value) == CONSTRUCTOR) @@ -11847,8 +11855,8 @@ expand_expr_real_1 (tree exp, rtx target break; } - return - expand_expr (fold (value), target, tmode, modifier); + return expand_expr (fold (value), target, tmode, + modifier); } } else if (TREE_CODE (init) == STRING_CST) --- gcc/tree-pretty-print.cc.jj 2024-06-14 19:45:09.446777591 +0200 +++ gcc/tree-pretty-print.cc 2024-07-04 14:58:33.571465397 +0200 @@ -2519,6 +2519,28 @@ dump_generic_node (pretty_printer *pp, t } break; + case RAW_DATA_CST: + for (unsigned i = 0; i < (unsigned) RAW_DATA_LENGTH (node); ++i) + { + if (TYPE_UNSIGNED (TREE_TYPE (node)) + || TYPE_PRECISION (TREE_TYPE (node)) > CHAR_BIT) + pp_decimal_int (pp, ((const unsigned char *) + RAW_DATA_POINTER (node))[i]); + else + pp_decimal_int (pp, ((const signed char *) + RAW_DATA_POINTER (node))[i]); + if (i == RAW_DATA_LENGTH (node) - 1U) + break; + else if (i == 9 && RAW_DATA_LENGTH (node) > 20) + { + pp_string (pp, ", ..., "); + i = RAW_DATA_LENGTH (node) - 11; + } + else + pp_string (pp, ", "); + } + break; + case FUNCTION_TYPE: case METHOD_TYPE: dump_generic_node (pp, TREE_TYPE (node), spc, flags, false); --- gcc/tree.cc.jj 2024-07-01 11:28:23.495227837 +0200 +++ gcc/tree.cc 2024-07-04 14:58:33.563465503 +0200 @@ -513,6 +513,7 @@ tree_node_structure_for_code (enum tree_ case STRING_CST: return TS_STRING; case VECTOR_CST: return TS_VECTOR; case VOID_CST: return TS_TYPED; + case RAW_DATA_CST: return TS_RAW_DATA_CST; /* tcc_exceptional cases. */ case BLOCK: return TS_BLOCK; @@ -571,6 +572,7 @@ initialize_tree_contains_struct (void) case TS_FIXED_CST: case TS_VECTOR: case TS_STRING: + case TS_RAW_DATA_CST: case TS_COMPLEX: case TS_SSA_NAME: case TS_CONSTRUCTOR: @@ -1026,6 +1028,7 @@ tree_code_size (enum tree_code code) case REAL_CST: return sizeof (tree_real_cst); case FIXED_CST: return sizeof (tree_fixed_cst); case COMPLEX_CST: return sizeof (tree_complex); + case RAW_DATA_CST: return sizeof (tree_raw_data); case VECTOR_CST: gcc_unreachable (); case STRING_CST: gcc_unreachable (); default: @@ -10467,6 +10470,15 @@ initializer_zerop (const_tree init, bool *nonzero = true; return false; + case RAW_DATA_CST: + for (unsigned int i = 0; i < (unsigned int) RAW_DATA_LENGTH (init); ++i) + if (RAW_DATA_POINTER (init)[i]) + { + *nonzero = true; + return false; + } + return true; + case CONSTRUCTOR: { if (TREE_CLOBBER_P (init)) --- gcc/testsuite/c-c++-common/cpp/embed-19.c.jj 2024-07-05 11:30:09.333874817 +0200 +++ gcc/testsuite/c-c++-common/cpp/embed-19.c 2024-07-05 11:35:19.825724327 +0200 @@ -0,0 +1,24 @@ +/* { dg-do run } */ +/* { dg-options "" } */ +/* { dg-additional-options "-std=c23" { target c } } */ + +unsigned char a[] = { +#embed __FILE__ +}; +struct S { unsigned char h[(sizeof (a) - 7) / 2]; short int i; unsigned char j[sizeof (a) - 7 - (sizeof (a) - 7) / 2]; }; +struct T { int a, b, c; struct S d; long long e; double f; long long g; }; +struct T b = { +#embed __FILE__ +}; + +int +main () +{ + if (b.a != a[0] || b.b != a[1] || b.c != a[2] + || __builtin_memcmp (b.d.h, a + 3, sizeof (b.d.h)) + || b.d.i != a[3 + sizeof (b.d.h)] + || __builtin_memcmp (b.d.j, a + 4 + sizeof (b.d.h), sizeof (b.d.j)) + || b.e != a[sizeof (a) - 3] || b.f != a[sizeof (a) - 2] + || b.g != a[sizeof (a) - 1]) + __builtin_abort (); +} --- gcc/testsuite/gcc.dg/cpp/embed-8.c.jj 2024-07-05 13:37:25.289157048 +0200 +++ gcc/testsuite/gcc.dg/cpp/embed-8.c 2024-07-05 13:39:15.232694163 +0200 @@ -0,0 +1,7 @@ +/* This is a comment with some UTF-8 non-ASCII characters: áéíóú. */ +/* { dg-do compile } */ +/* { dg-options "-std=c23 -Wconversion" } */ + +signed char a[] = { +#embed __FILE__ /* { dg-warning "conversion from 'int' to 'signed char' changes value from '\[12]\[0-9]\[0-9]' to '-\[0-9]\[0-9]*'" } */ +}; --- gcc/testsuite/gcc.dg/cpp/embed-7.c.jj 2024-07-05 13:27:28.580097964 +0200 +++ gcc/testsuite/gcc.dg/cpp/embed-7.c 2024-07-05 13:36:04.728228965 +0200 @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c23 -Woverride-init" } */ + +unsigned char a[] = { +#embed __FILE__ +}; +unsigned char b[] = { + [26] = +#embed __FILE__ +}; +unsigned char c[] = { +#embed __FILE__ suffix (,) + [sizeof (a) / 4] = 0, /* { dg-warning "initialized field overwritten" } */ + [sizeof (a) / 2] = 1, /* { dg-warning "initialized field overwritten" } */ + [1] = 2, /* { dg-warning "initialized field overwritten" } */ + [sizeof (a) - 2] = 3 /* { dg-warning "initialized field overwritten" } */ +}; +unsigned char d[] = { + [1] = 4, + [26] = 5, + [sizeof (a) / 4] = 6, + [sizeof (a) / 2] = 7, + [sizeof (a) - 2] = 8, +#embed __FILE__ prefix ([0] = ) /* { dg-warning "initialized field overwritten" } */ +}; +unsigned char e[] = { +#embed __FILE__ suffix (,) + [2] = 9, /* { dg-warning "initialized field overwritten" } */ + [sizeof (a) - 3] = 10 /* { dg-warning "initialized field overwritten" } */ +}; +unsigned char f[] = { + [23] = 11, + [sizeof (a) / 4 - 1] = 12, +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 - 1] = ) suffix (,) /* { dg-warning "initialized field overwritten" } */ +#embed __FILE__ limit (130) prefix ([sizeof (a) / 4 - 2] = ) suffix (,) /* { dg-warning "initialized field overwritten" } */ +#embed __FILE__ prefix ([sizeof (a) / 4 + 10] = ) suffix (,) /* { dg-warning "initialized field overwritten" } */ +#embed __FILE__ limit (128) prefix ([sizeof (a) + sizeof (a) / 4 - 30] = ) suffix (,) /* { dg-warning "initialized field overwritten" } */ +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 + 96] = ) suffix (,) /* { dg-warning "initialized field overwritten" } */ +}; --- gcc/testsuite/gcc.dg/cpp/embed-9.c.jj 2024-07-05 13:54:06.976828053 +0200 +++ gcc/testsuite/gcc.dg/cpp/embed-9.c 2024-07-05 13:53:54.994987508 +0200 @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c23" } */ + +struct __attribute__((designated_init)) S { + int a, b, c, d; + unsigned char e[128]; +}; + +struct S s = { .a = 1, .b = +#embed __FILE__ limit(128) /* { dg-warning "positional initialization of field in 'struct' declared with 'designated_init' attribute" } */ +}; /* { dg-message "near initialization" "" { target *-*-* } .-1 } */ --- gcc/testsuite/gcc.dg/cpp/embed-6.c.jj 2024-07-05 13:25:08.339965010 +0200 +++ gcc/testsuite/gcc.dg/cpp/embed-6.c 2024-07-05 13:24:03.036834399 +0200 @@ -0,0 +1,82 @@ +/* { dg-do run } */ +/* { dg-options "-std=c23" } */ + +unsigned char a[] = { +#embed __FILE__ +}; +unsigned char b[] = { + [26] = +#embed __FILE__ +}; +unsigned char c[] = { +#embed __FILE__ suffix (,) + [sizeof (a) / 4] = 0, + [sizeof (a) / 2] = 1, + [1] = 2, + [sizeof (a) - 2] = 3 +}; +unsigned char d[] = { + [1] = 4, + [26] = 5, + [sizeof (a) / 4] = 6, + [sizeof (a) / 2] = 7, + [sizeof (a) - 2] = 8, +#embed __FILE__ prefix ([0] = ) +}; +unsigned char e[] = { +#embed __FILE__ suffix (,) + [2] = 9, + [sizeof (a) - 3] = 10 +}; +unsigned char f[] = { + [23] = 11, + [sizeof (a) / 4 - 1] = 12, +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 - 1] = ) suffix (,) +#embed __FILE__ limit (130) prefix ([sizeof (a) / 4 - 2] = ) suffix (,) +#embed __FILE__ prefix ([sizeof (a) / 4 + 10] = ) suffix (,) +#embed __FILE__ limit (128) prefix ([sizeof (a) + sizeof (a) / 4 - 30] = ) suffix (,) +#embed __FILE__ limit (128) prefix ([sizeof (a) / 4 + 96] = ) suffix (,) +}; +unsigned char z[sizeof (a) / 4] = { +}; + +int +main () +{ + if (sizeof (b) != sizeof (a) + 26 + || __builtin_memcmp (a, b + 26, sizeof (a))) + __builtin_abort (); + if (sizeof (c) != sizeof (a) + || a[0] != c[0] + || c[1] != 2 + || __builtin_memcmp (a + 2, c + 2, sizeof (a) / 4 - 2) + || c[sizeof (a) / 4] != 0 + || __builtin_memcmp (a + sizeof (a) / 4 + 1, c + sizeof (a) / 4 + 1, sizeof (a) / 2 - sizeof (a) / 4 - 1) + || c[sizeof (a) / 2] != 1 + || __builtin_memcmp (a + sizeof (a) / 2 + 1, c + sizeof (a) / 2 + 1, sizeof (a) - sizeof (a) / 2 - 3) + || c[sizeof (a) - 2] != 3 + || a[sizeof (a) - 1] != c[sizeof (a) - 1]) + __builtin_abort (); + if (sizeof (d) != sizeof (a) + || __builtin_memcmp (a, d, sizeof (a))) + __builtin_abort (); + if (sizeof (e) != sizeof (a) + || a[0] != e[0] + || a[1] != e[1] + || e[2] != 9 + || __builtin_memcmp (a + 3, e + 3, sizeof (a) - 6) + || e[sizeof (a) - 3] != 10 + || a[sizeof (a) - 2] != e[sizeof (a) - 2] + || a[sizeof (a) - 1] != e[sizeof (a) - 1]) + __builtin_abort (); + if (sizeof (f) != sizeof (a) + sizeof (a) / 4 - 30 + 128 + || __builtin_memcmp (z, f, 23) + || f[23] != 11 + || __builtin_memcmp (z, f + 24, sizeof (a) / 4 - 2 - 24) + || __builtin_memcmp (f + sizeof (a) / 4 - 2, a, 12) + || __builtin_memcmp (f + sizeof (a) / 4 + 10, a, 86) + || __builtin_memcmp (f + sizeof (a) / 4 + 96, a, 128) + || __builtin_memcmp (f + sizeof (a) / 4 + 96 + 128, a + 86 + 128, sizeof (a) - 86 - 128 - 40) + || __builtin_memcmp (f + sizeof (a) + sizeof (a) / 4 - 30, a, 128)) + __builtin_abort (); +} --- gcc/fold-const.cc.jj 2024-06-05 15:42:28.144707055 +0200 +++ gcc/fold-const.cc 2024-07-06 10:46:59.487035697 +0200 @@ -8405,6 +8405,48 @@ native_encode_initializer (tree init, un } curpos = pos; + if (val && TREE_CODE (val) == RAW_DATA_CST) + { + if (count) + return 0; + if (off == -1 + || (curpos >= off + && (curpos + RAW_DATA_LENGTH (val) + <= (HOST_WIDE_INT) off + len))) + { + if (ptr) + memcpy (ptr + (curpos - o), RAW_DATA_POINTER (val), + RAW_DATA_LENGTH (val)); + if (mask) + memset (mask + curpos, 0, RAW_DATA_LENGTH (val)); + } + else if (curpos + RAW_DATA_LENGTH (val) > off + && curpos < (HOST_WIDE_INT) off + len) + { + /* Partial overlap. */ + unsigned char *p = NULL; + int no = 0; + int l; + gcc_assert (mask == NULL); + if (curpos >= off) + { + if (ptr) + p = ptr + curpos - off; + l = MIN ((HOST_WIDE_INT) off + len - curpos, + RAW_DATA_LENGTH (val)); + } + else + { + p = ptr; + no = off - curpos; + l = len; + } + if (p) + memcpy (p, RAW_DATA_POINTER (val) + no, l); + } + curpos += RAW_DATA_LENGTH (val); + val = NULL_TREE; + } if (val) do { @@ -13768,6 +13810,9 @@ get_array_ctor_element_at_index (tree ct else first_p = false; + if (TREE_CODE (cval) == RAW_DATA_CST) + max_index += RAW_DATA_LENGTH (cval) - 1; + /* Do we have match? */ if (wi::cmp (access_index, index, index_sgn) >= 0) { @@ -13867,10 +13912,26 @@ fold (tree expr) && TREE_CODE (op0) == CONSTRUCTOR && ! type_contains_placeholder_p (TREE_TYPE (op0))) { - tree val = get_array_ctor_element_at_index (op0, - wi::to_offset (op1)); + unsigned int idx; + tree val + = get_array_ctor_element_at_index (op0, wi::to_offset (op1), + &idx); if (val) - return val; + { + if (TREE_CODE (val) != RAW_DATA_CST) + return val; + if (CONSTRUCTOR_ELT (op0, idx)->index == NULL_TREE + || (TREE_CODE (CONSTRUCTOR_ELT (op0, idx)->index) + != INTEGER_CST)) + return t; + offset_int o + = (wi::to_offset (op1) + - wi::to_offset (CONSTRUCTOR_ELT (op0, idx)->index)); + gcc_checking_assert (o < RAW_DATA_LENGTH (val)); + return build_int_cst (TREE_TYPE (val), + ((const unsigned char *) + RAW_DATA_POINTER (val))[o.to_uhwi ()]); + } } return t; --- gcc/c/c-parser.cc.jj 2024-07-01 11:28:21.840249061 +0200 +++ gcc/c/c-parser.cc 2024-07-04 14:58:33.568465437 +0200 @@ -6212,6 +6212,25 @@ c_parser_braced_init (c_parser *parser, { last_init_list_comma = c_parser_peek_token (parser)->location; c_parser_consume_token (parser); + /* CPP_EMBED should be always in between two CPP_COMMA + tokens. */ + while (c_parser_next_token_is (parser, CPP_EMBED)) + { + c_token *embed = c_parser_peek_token (parser); + c_parser_consume_token (parser); + c_expr embed_val; + embed_val.value = embed->value; + embed_val.original_code = RAW_DATA_CST; + embed_val.original_type = integer_type_node; + set_c_expr_source_range (&embed_val, embed->get_range ()); + embed_val.m_decimal = 0; + process_init_element (embed->location, embed_val, false, + &braced_init_obstack); + gcc_checking_assert (c_parser_next_token_is (parser, + CPP_COMMA)); + last_init_list_comma = c_parser_peek_token (parser)->location; + c_parser_consume_token (parser); + } } else break; --- gcc/c/c-typeck.cc.jj 2024-06-14 19:45:07.455803708 +0200 +++ gcc/c/c-typeck.cc 2024-07-05 12:48:05.357558694 +0200 @@ -8747,12 +8747,13 @@ digest_init (location_t init_loc, tree t if (!maybe_const) arith_const_expr = false; else if (!INTEGRAL_TYPE_P (TREE_TYPE (inside_init)) - && TREE_CODE (TREE_TYPE (inside_init)) != REAL_TYPE - && TREE_CODE (TREE_TYPE (inside_init)) != COMPLEX_TYPE) + && TREE_CODE (TREE_TYPE (inside_init)) != REAL_TYPE + && TREE_CODE (TREE_TYPE (inside_init)) != COMPLEX_TYPE) arith_const_expr = false; else if (TREE_CODE (inside_init) != INTEGER_CST - && TREE_CODE (inside_init) != REAL_CST - && TREE_CODE (inside_init) != COMPLEX_CST) + && TREE_CODE (inside_init) != REAL_CST + && TREE_CODE (inside_init) != COMPLEX_CST + && TREE_CODE (inside_init) != RAW_DATA_CST) arith_const_expr = false; else if (TREE_OVERFLOW (inside_init)) arith_const_expr = false; @@ -9013,6 +9014,22 @@ digest_init (location_t init_loc, tree t ? ic_init_const : ic_init), null_pointer_constant, NULL_TREE, NULL_TREE, 0); + if (TREE_CODE (inside_init) == RAW_DATA_CST + && c_inhibit_evaluation_warnings == 0 + && warn_overflow + && !TYPE_UNSIGNED (type) + && TYPE_PRECISION (type) == CHAR_BIT) + for (unsigned int i = 0; + i < (unsigned) RAW_DATA_LENGTH (inside_init); ++i) + if (((const signed char *) RAW_DATA_POINTER (inside_init))[i] < 0) + warning_at (init_loc, OPT_Wconversion, + "conversion from %qT to %qT changes value from " + "%qd to %qd", + integer_type_node, type, + ((const unsigned char *) + RAW_DATA_POINTER (inside_init))[i], + ((const signed char *) + RAW_DATA_POINTER (inside_init))[i]); return inside_init; } @@ -10124,6 +10141,28 @@ set_init_label (location_t loc, tree fie while (field != NULL_TREE); } +/* Helper function for add_pending_init. Find inorder successor of P + in AVL tree. */ +static struct init_node * +init_node_successor (struct init_node *p) +{ + struct init_node *r; + if (p->right) + { + r = p->right; + while (r->left) + r = r->left; + return r; + } + r = p->parent; + while (r && p == r->right) + { + p = r; + r = r->parent; + } + return r; +} + /* Add a new initializer to the tree of pending initializers. PURPOSE identifies the initializer, either array index or field in a structure. VALUE is the value of that index or field. If ORIGTYPE is not @@ -10151,9 +10190,179 @@ add_pending_init (location_t loc, tree p if (tree_int_cst_lt (purpose, p->purpose)) q = &p->left; else if (tree_int_cst_lt (p->purpose, purpose)) - q = &p->right; + { + if (TREE_CODE (p->value) != RAW_DATA_CST + || (p->right + && tree_int_cst_le (p->right->purpose, purpose))) + q = &p->right; + else + { + widest_int pp = wi::to_widest (p->purpose); + widest_int pw = wi::to_widest (purpose); + if (pp + RAW_DATA_LENGTH (p->value) <= pw) + q = &p->right; + else + { + /* Override which should split the old RAW_DATA_CST + into 2 or 3 pieces. */ + if (!implicit && warn_override_init) + warning_init (loc, OPT_Woverride_init, + "initialized field overwritten"); + unsigned HOST_WIDE_INT start = (pw - pp).to_uhwi (); + unsigned HOST_WIDE_INT len = 1; + if (TREE_CODE (value) == RAW_DATA_CST) + len = RAW_DATA_LENGTH (value); + unsigned HOST_WIDE_INT end = 0; + unsigned plen = RAW_DATA_LENGTH (p->value); + gcc_checking_assert (start < plen && start); + if (plen - start > len) + end = plen - start - len; + tree v = p->value; + tree origtype = p->origtype; + if (start == 1) + p->value = build_int_cst (TREE_TYPE (v), + *(const unsigned char *) + RAW_DATA_POINTER (v)); + else + { + p->value = v; + if (end > 1) + v = copy_node (v); + RAW_DATA_LENGTH (p->value) = start; + } + if (end) + { + tree epurpose + = size_binop (PLUS_EXPR, purpose, + bitsize_int (len)); + if (end > 1) + { + RAW_DATA_LENGTH (v) -= plen - end; + RAW_DATA_POINTER (v) += plen - end; + } + else + v = build_int_cst (TREE_TYPE (v), + ((const unsigned char *) + RAW_DATA_POINTER (v))[plen + - end]); + add_pending_init (loc, epurpose, v, origtype, + implicit, braced_init_obstack); + } + q = &constructor_pending_elts; + continue; + } + } + } else { + if (TREE_CODE (p->value) == RAW_DATA_CST + && (RAW_DATA_LENGTH (p->value) + > (TREE_CODE (value) == RAW_DATA_CST + ? RAW_DATA_LENGTH (value) : 1))) + { + /* Override which should split the old RAW_DATA_CST + into 2 pieces. */ + if (!implicit && warn_override_init) + warning_init (loc, OPT_Woverride_init, + "initialized field overwritten"); + unsigned HOST_WIDE_INT len = 1; + if (TREE_CODE (value) == RAW_DATA_CST) + len = RAW_DATA_LENGTH (value); + if ((unsigned) RAW_DATA_LENGTH (p->value) > len + 1) + { + RAW_DATA_LENGTH (p->value) -= len; + RAW_DATA_POINTER (p->value) += len; + } + else + { + unsigned int l = RAW_DATA_LENGTH (p->value) - 1; + p->value + = build_int_cst (TREE_TYPE (p->value), + ((const unsigned char *) + RAW_DATA_POINTER (p->value))[l]); + } + p->purpose = size_binop (PLUS_EXPR, p->purpose, + bitsize_int (len)); + continue; + } + if (TREE_CODE (value) == RAW_DATA_CST) + { + handle_raw_data: + /* RAW_DATA_CST value might overlap various further + prior initval entries. Find out how many. */ + unsigned cnt = 0; + widest_int w + = wi::to_widest (purpose) + RAW_DATA_LENGTH (value); + struct init_node *r = p, *last = NULL; + bool override_init = warn_override_init; + while ((r = init_node_successor (r)) + && wi::to_widest (r->purpose) < w) + { + ++cnt; + if (TREE_SIDE_EFFECTS (r->value)) + warning_init (loc, OPT_Woverride_init_side_effects, + "initialized field with side-effects " + "overwritten"); + else if (override_init) + { + warning_init (loc, OPT_Woverride_init, + "initialized field overwritten"); + override_init = false; + } + last = r; + } + if (cnt) + { + if (TREE_CODE (last->value) == RAW_DATA_CST + && (wi::to_widest (last->purpose) + + RAW_DATA_LENGTH (last->value) > w)) + { + /* The last overlapping prior initval overlaps + only partially. Shrink it and decrease cnt. */ + unsigned int l = (wi::to_widest (last->purpose) + + RAW_DATA_LENGTH (last->value) + - w).to_uhwi (); + --cnt; + RAW_DATA_LENGTH (last->value) -= l; + RAW_DATA_POINTER (last->value) += l; + if (RAW_DATA_LENGTH (last->value) == 1) + { + const unsigned char *s + = ((const unsigned char *) + RAW_DATA_POINTER (last->value)); + last->value + = build_int_cst (TREE_TYPE (last->value), *s); + } + last->purpose + = size_binop (PLUS_EXPR, last->purpose, + bitsize_int (l)); + } + /* Instead of deleting cnt nodes from the AVL tree + and rebalancing, peel of last cnt bytes from the + RAW_DATA_CST. Overriding thousands of previously + initialized array elements with #embed needs to work, + but doesn't need to be super efficient. */ + gcc_checking_assert ((unsigned) RAW_DATA_LENGTH (value) + > cnt); + RAW_DATA_LENGTH (value) -= cnt; + const unsigned char *s + = ((const unsigned char *) RAW_DATA_POINTER (value) + + RAW_DATA_LENGTH (value)); + unsigned int o = RAW_DATA_LENGTH (value); + for (r = p; cnt--; ++o, ++s) + { + r = init_node_successor (r); + r->purpose = size_binop (PLUS_EXPR, purpose, + bitsize_int (o)); + r->value = build_int_cst (TREE_TYPE (value), *s); + r->origtype = origtype; + } + if (RAW_DATA_LENGTH (value) == 1) + value = build_int_cst (TREE_TYPE (value), + *((const unsigned char *) + RAW_DATA_POINTER (value))); + } + } if (!implicit) { if (TREE_SIDE_EFFECTS (p->value)) @@ -10169,6 +10378,23 @@ add_pending_init (location_t loc, tree p return; } } + if (TREE_CODE (value) == RAW_DATA_CST && p) + { + struct init_node *r; + if (q == &p->left) + r = p; + else + r = init_node_successor (p); + if (r && wi::to_widest (r->purpose) < (wi::to_widest (purpose) + + RAW_DATA_LENGTH (value))) + { + /* Overlap with at least one prior initval in the range but + not at the start. */ + p = r; + p->purpose = purpose; + goto handle_raw_data; + } + } } else { @@ -10397,8 +10623,8 @@ set_nonincremental_init (struct obstack { if (TYPE_DOMAIN (constructor_type)) constructor_unfilled_index - = convert (bitsizetype, - TYPE_MIN_VALUE (TYPE_DOMAIN (constructor_type))); + = convert (bitsizetype, + TYPE_MIN_VALUE (TYPE_DOMAIN (constructor_type))); else constructor_unfilled_index = bitsize_zero_node; } @@ -10612,12 +10838,13 @@ output_init_element (location_t loc, tre if (!maybe_const) arith_const_expr = false; else if (!INTEGRAL_TYPE_P (TREE_TYPE (value)) - && TREE_CODE (TREE_TYPE (value)) != REAL_TYPE - && TREE_CODE (TREE_TYPE (value)) != COMPLEX_TYPE) + && TREE_CODE (TREE_TYPE (value)) != REAL_TYPE + && TREE_CODE (TREE_TYPE (value)) != COMPLEX_TYPE) arith_const_expr = false; else if (TREE_CODE (value) != INTEGER_CST - && TREE_CODE (value) != REAL_CST - && TREE_CODE (value) != COMPLEX_CST) + && TREE_CODE (value) != REAL_CST + && TREE_CODE (value) != COMPLEX_CST + && TREE_CODE (value) != RAW_DATA_CST) arith_const_expr = false; else if (TREE_OVERFLOW (value)) arith_const_expr = false; @@ -10784,9 +11011,14 @@ output_init_element (location_t loc, tre /* Advance the variable that indicates sequential elements output. */ if (TREE_CODE (constructor_type) == ARRAY_TYPE) - constructor_unfilled_index - = size_binop_loc (input_location, PLUS_EXPR, constructor_unfilled_index, - bitsize_one_node); + { + tree inc = bitsize_one_node; + if (value && TREE_CODE (value) == RAW_DATA_CST) + inc = bitsize_int (RAW_DATA_LENGTH (value)); + constructor_unfilled_index + = size_binop_loc (input_location, PLUS_EXPR, + constructor_unfilled_index, inc); + } else if (TREE_CODE (constructor_type) == RECORD_TYPE) { constructor_unfilled_fields @@ -10795,8 +11027,8 @@ output_init_element (location_t loc, tre /* Skip any nameless bit fields. */ while (constructor_unfilled_fields != NULL_TREE && DECL_UNNAMED_BIT_FIELD (constructor_unfilled_fields)) - constructor_unfilled_fields = - DECL_CHAIN (constructor_unfilled_fields); + constructor_unfilled_fields + = DECL_CHAIN (constructor_unfilled_fields); } else if (TREE_CODE (constructor_type) == UNION_TYPE) constructor_unfilled_fields = NULL_TREE; @@ -11042,6 +11274,23 @@ initialize_elementwise_p (tree type, tre return false; } +/* Helper function for process_init_element. Split first element of + RAW_DATA_CST and save the rest to *RAW_DATA. */ + +static inline tree +maybe_split_raw_data (tree value, tree *raw_data) +{ + if (value == NULL_TREE || TREE_CODE (value) != RAW_DATA_CST) + return value; + *raw_data = value; + value = build_int_cst (integer_type_node, + *(const unsigned char *) + RAW_DATA_POINTER (*raw_data)); + ++RAW_DATA_POINTER (*raw_data); + --RAW_DATA_LENGTH (*raw_data); + return value; +} + /* Add one non-braced element to the current constructor level. This adjusts the current position within the constructor's type. This may also start or terminate implicit levels @@ -11064,7 +11313,9 @@ process_init_element (location_t loc, st = (orig_value != NULL_TREE && TREE_CODE (orig_value) == STRING_CST); bool strict_string = value.original_code == STRING_CST; bool was_designated = designator_depth != 0; + tree raw_data = NULL_TREE; +retry: designator_depth = 0; designator_erroneous = 0; @@ -11232,6 +11483,7 @@ process_init_element (location_t loc, st continue; } + value.value = maybe_split_raw_data (value.value, &raw_data); if (value.value) { push_member_name (constructor_fields); @@ -11320,6 +11572,7 @@ process_init_element (location_t loc, st continue; } + value.value = maybe_split_raw_data (value.value, &raw_data); if (value.value) { push_member_name (constructor_fields); @@ -11368,26 +11621,66 @@ process_init_element (location_t loc, st break; } - /* Now output the actual element. */ - if (value.value) + if (value.value + && TREE_CODE (value.value) == RAW_DATA_CST + && RAW_DATA_LENGTH (value.value) > 1 + && (TREE_CODE (elttype) == INTEGER_TYPE + || TREE_CODE (elttype) == BITINT_TYPE) + && TYPE_PRECISION (elttype) == CHAR_BIT + && (constructor_max_index == NULL_TREE + || tree_int_cst_lt (constructor_index, + constructor_max_index))) { + unsigned int len = RAW_DATA_LENGTH (value.value); + if (constructor_max_index) + { + widest_int w = wi::to_widest (constructor_max_index); + w -= wi::to_widest (constructor_index); + w += 1; + if (w < len) + len = w.to_uhwi (); + } + if (len < (unsigned) RAW_DATA_LENGTH (value.value)) + { + raw_data = copy_node (value.value); + RAW_DATA_LENGTH (raw_data) -= len; + RAW_DATA_POINTER (raw_data) += len; + RAW_DATA_LENGTH (value.value) = len; + } + TREE_TYPE (value.value) = elttype; push_array_bounds (tree_to_uhwi (constructor_index)); output_init_element (loc, value.value, value.original_type, - strict_string, elttype, - constructor_index, true, implicit, - braced_init_obstack); + false, elttype, constructor_index, true, + implicit, braced_init_obstack); RESTORE_SPELLING_DEPTH (constructor_depth); + constructor_index + = size_binop_loc (input_location, PLUS_EXPR, + constructor_index, bitsize_int (len)); } + else + { + value.value = maybe_split_raw_data (value.value, &raw_data); + /* Now output the actual element. */ + if (value.value) + { + push_array_bounds (tree_to_uhwi (constructor_index)); + output_init_element (loc, value.value, value.original_type, + strict_string, elttype, + constructor_index, true, implicit, + braced_init_obstack); + RESTORE_SPELLING_DEPTH (constructor_depth); + } - constructor_index - = size_binop_loc (input_location, PLUS_EXPR, - constructor_index, bitsize_one_node); - - if (!value.value) - /* If we are doing the bookkeeping for an element that was - directly output as a constructor, we must update - constructor_unfilled_index. */ - constructor_unfilled_index = constructor_index; + constructor_index + = size_binop_loc (input_location, PLUS_EXPR, + constructor_index, bitsize_one_node); + + if (!value.value) + /* If we are doing the bookkeeping for an element that was + directly output as a constructor, we must update + constructor_unfilled_index. */ + constructor_unfilled_index = constructor_index; + } } else if (gnu_vector_type_p (constructor_type)) { @@ -11402,6 +11695,7 @@ process_init_element (location_t loc, st break; } + value.value = maybe_split_raw_data (value.value, &raw_data); /* Now output the actual element. */ if (value.value) { @@ -11435,6 +11729,7 @@ process_init_element (location_t loc, st } else { + value.value = maybe_split_raw_data (value.value, &raw_data); if (value.value) output_init_element (loc, value.value, value.original_type, strict_string, constructor_type, @@ -11506,6 +11801,14 @@ process_init_element (location_t loc, st } constructor_range_stack = 0; + + if (raw_data && RAW_DATA_LENGTH (raw_data)) + { + gcc_assert (!string_flag && !was_designated); + value.value = raw_data; + raw_data = NULL_TREE; + goto retry; + } } /* Build a complete asm-statement, whose components are a CV_QUALIFIER --- gcc/tree.def.jj 2024-06-05 19:09:54.045617019 +0200 +++ gcc/tree.def 2024-07-05 10:10:48.372613006 +0200 @@ -309,6 +309,12 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */ DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0) +/* Contents are RAW_DATA_LENGTH and the actual content + of the raw data, plus RAW_DATA_OWNER if non-NULL for owner of the + data (e.g. STRING_CST), if it is NULL, the data is owned by libcpp. + TREE_TYPE is the type of each of the RAW_DATA_LENGTH elements. */ +DEFTREECODE (RAW_DATA_CST, "raw_data_cst", tcc_constant, 0) + /* Declarations. All references to names are represented as ..._DECL nodes. The decls in one binding context are chained through the TREE_CHAIN field. Each DECL has a DECL_NAME field which contains --- gcc/c-family/c-lex.cc.jj 2024-02-22 19:29:51.226074838 +0100 +++ gcc/c-family/c-lex.cc 2024-07-04 14:58:33.568465437 +0200 @@ -781,6 +781,13 @@ c_lex_with_flags (tree *value, location_ *value = build_string (tok->val.str.len, (const char *)tok->val.str.text); break; + case CPP_EMBED: + *value = make_node (RAW_DATA_CST); + TREE_TYPE (*value) = integer_type_node; + RAW_DATA_LENGTH (*value) = tok->val.str.len; + RAW_DATA_POINTER (*value) = (const char *) tok->val.str.text; + break; + /* This token should not be visible outside cpplib. */ case CPP_MACRO_ARG: gcc_unreachable (); @@ -800,7 +807,7 @@ c_lex_with_flags (tree *value, location_ add_flags |= PREV_FALLTHROUGH; goto retry_after_at; } - goto retry; + goto retry; default: *value = NULL_TREE; --- gcc/tree-core.h.jj 2024-07-01 11:28:23.408228952 +0200 +++ gcc/tree-core.h 2024-07-03 19:41:28.821880055 +0200 @@ -1516,6 +1516,13 @@ struct GTY(()) tree_string { char str[1]; }; +struct GTY(()) tree_raw_data { + struct tree_typed typed; + tree owner; + const char *GTY ((skip(""))) str; + int length; +}; + struct GTY(()) tree_complex { struct tree_typed typed; tree real; @@ -2106,6 +2113,7 @@ union GTY ((ptr_alias (union lang_tree_n struct tree_fixed_cst GTY ((tag ("TS_FIXED_CST"))) fixed_cst; struct tree_vector GTY ((tag ("TS_VECTOR"))) vector; struct tree_string GTY ((tag ("TS_STRING"))) string; + struct tree_raw_data GTY ((tag ("TS_RAW_DATA_CST"))) raw_data_cst; struct tree_complex GTY ((tag ("TS_COMPLEX"))) complex; struct tree_identifier GTY ((tag ("TS_IDENTIFIER"))) identifier; struct tree_decl_minimal GTY((tag ("TS_DECL_MINIMAL"))) decl_minimal; --- gcc/gimple-fold.cc.jj 2024-06-05 15:42:28.178706605 +0200 +++ gcc/gimple-fold.cc 2024-07-06 11:14:41.202981865 +0200 @@ -8000,7 +8000,7 @@ fold_array_ctor_reference (tree type, tr unsigned ctor_idx; tree val = get_array_ctor_element_at_index (ctor, access_index, &ctor_idx); - if (!val && ctor_idx >= CONSTRUCTOR_NELTS (ctor)) + if (!val && ctor_idx >= CONSTRUCTOR_NELTS (ctor)) return build_zero_cst (type); /* native-encode adjacent ctor elements. */ @@ -8027,10 +8027,27 @@ fold_array_ctor_reference (tree type, tr { if (bufoff + elt_sz > sizeof (buf)) elt_sz = sizeof (buf) - bufoff; - int len = native_encode_expr (val, buf + bufoff, elt_sz, + int len; + if (TREE_CODE (val) == RAW_DATA_CST) + { + gcc_assert (inner_offset == 0); + if (!elt->index || TREE_CODE (elt->index) != INTEGER_CST) + return NULL_TREE; + inner_offset = (access_index + - wi::to_offset (elt->index)).to_uhwi (); + len = MIN (sizeof (buf) - bufoff, + (unsigned) (RAW_DATA_LENGTH (val) - inner_offset)); + memcpy (buf + bufoff, RAW_DATA_POINTER (val) + inner_offset, + len); + access_index += len - 1; + } + else + { + len = native_encode_expr (val, buf + bufoff, elt_sz, inner_offset / BITS_PER_UNIT); - if (len != (int) elt_sz - inner_offset / BITS_PER_UNIT) - return NULL_TREE; + if (len != (int) elt_sz - inner_offset / BITS_PER_UNIT) + return NULL_TREE; + } inner_offset = 0; bufoff += len; @@ -8072,8 +8089,23 @@ fold_array_ctor_reference (tree type, tr return native_interpret_expr (type, buf, size / BITS_PER_UNIT); } - if (tree val = get_array_ctor_element_at_index (ctor, access_index)) + unsigned ctor_idx; + if (tree val = get_array_ctor_element_at_index (ctor, access_index, + &ctor_idx)) { + if (TREE_CODE (val) == RAW_DATA_CST) + { + if (size != BITS_PER_UNIT || elt_sz != 1 || inner_offset != 0) + return NULL_TREE; + constructor_elt *elt = CONSTRUCTOR_ELT (ctor, ctor_idx); + if (elt->index == NULL_TREE || TREE_CODE (elt->index) != INTEGER_CST) + return NULL_TREE; + *suboff += access_index.to_uhwi () * BITS_PER_UNIT; + unsigned o = (access_index - wi::to_offset (elt->index)).to_uhwi (); + return build_int_cst (TREE_TYPE (val), + ((const unsigned char *) + RAW_DATA_POINTER (val))[o]); + } if (!size && TREE_CODE (val) != CONSTRUCTOR) { /* For the final reference to the entire accessed element --- gcc/treestruct.def.jj 2024-01-03 11:51:38.761630845 +0100 +++ gcc/treestruct.def 2024-07-03 17:06:57.539794162 +0200 @@ -39,6 +39,7 @@ DEFTREESTRUCT(TS_REAL_CST, "real cst") DEFTREESTRUCT(TS_FIXED_CST, "fixed cst") DEFTREESTRUCT(TS_VECTOR, "vector") DEFTREESTRUCT(TS_STRING, "string") +DEFTREESTRUCT(TS_RAW_DATA_CST, "raw data cst") DEFTREESTRUCT(TS_COMPLEX, "complex") DEFTREESTRUCT(TS_IDENTIFIER, "identifier") DEFTREESTRUCT(TS_DECL_MINIMAL, "decl minimal") Jakub