On Wed, Jun 11, 2025, 9:24 AM Andrew MacLeod <amacl...@redhat.com> wrote:
> > On 6/11/25 11:02, Andrew MacLeod wrote: > > > > On 6/10/25 17:05, Richard Biener wrote: > >> > >> > >>> Am 10.06.2025 um 22:18 schrieb Andrew MacLeod <amacl...@redhat.com>: > >>> > >>> > >>> > >>> I had a question asked of me, and now I'm passing the buck. > >>> > >>> extern void *memcpy(void *, const void *, unsigned int); > >>> extern int memcmp(const void *, const void *, unsigned int); > >>> typedef unsigned long bits32; > >>> typedef unsigned char byte; > >>> > >>> static const byte orig[10] = { > >>> 'J', '2', 'O', 'Z', 'F', '5', '0', 'F', 'Y', 'L' }; > >>> > >>> static byte test[10]; > >>> > >>> int > >>> verify (void) > >>> { > >>> return 0 == memcmp (test, orig, 10 * sizeof (orig[0])); > >>> } > >>> > >>> int > >>> benchmark (void) > >>> { > >>> memcpy (test, orig, 10 * sizeof (orig[0])); > >>> return 0; > >>> } > >>> > >>> > >>> Target is arm-none-eabi, and when compiled with -Os > >>> > >>> After the gimple lowering, the verify routine remains the same, but > >>> the benchmark () routine is transformed from a memcpy and becomes: > >>> > >>> > >>> ;; Function benchmark (benchmark, funcdef_no=1, decl_uid=4718, > >>> cgraph_uid=4, symbol_order=3) > >>> > >>> int benchmark () > >>> { > >>> int D.4726; > >>> > >>> MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM > >>> <unsigned char[10]> [(char * {ref-all})&orig]; > >>> D.4726 = 0; > >>> goto <D.4727>; > >>> <D.4727>: > >>> return D.4726; > >>> } > >>> > >>> > >>> It appears that forwprop is then transforming the statement to > >>> <bb 2> : > >>> MEM <unsigned char[10]> [(char * {ref-all})&test] = "J2OZF50FYL"; > >>> return 0; > >>> > >>> And in the final output, there are now 2 copies of the original > >>> character data: > >>> > >>> orig: > >>> .ascii "J2OZF50FYL" > >>> .space 2 > >>> .LC0: > >>> .ascii "J2OZF50FYL" > >>> .bss > >>> > >>> > >>> and I presume that new string is a copy of the orig text that > >>> forwprop has created for some reason. > >>> > >>> Whats going on, and is there a way to disable this? Either at the > >>> lowering stage or in forwprop? At -Os, they are not thrilled that > >>> a bunch more redundant text is being generated in the object file. > >>> This is a reduced testcase to demonstrate a much larger problem. > >>> > >> The hope is the static var can be elided and the read might be just a > >> small part. In this case heuristics are misfiring I guess. You’d > >> have to track down where exactly in folding we are replacing the RHS > >> of an aggregate copy. I can’t recall off my head. > >> > >> Richard > > > > heres my traceback where the "magic" happens > > > > #0 fold_ctor_reference (type=0x7fffe9f3be70, ctor=0x7fffe9f2cc00, > > poly_offset=..., poly_size=..., from_decl=0x7fffe9c6f980, suboff=0x0) > > at /gcc/master/gcc/gcc/gimple-fold.cc:9955 > > #1 0x0000000001200074 in fold_const_aggregate_ref_1 > > (t=0x7fffe9f46de8, valueize=0x0) at > > /gcc/master/gcc/gcc/gimple-fold.cc:10134 > > #2 0x0000000001200918 in fold_const_aggregate_ref (t=0x7fffe9f46de8) > > at /gcc/master/gcc/gcc/gimple-fold.cc:10213 > > #3 0x00000000011db1aa in maybe_fold_reference (expr=0x7fffe9f46de8) > > at /gcc/master/gcc/gcc/gimple-fold.cc:325 > > #4 0x00000000011db8bf in fold_gimple_assign (si=0x7fffffffd410) at > > /gcc/master/gcc/gcc/gimple-fold.cc:473 > > #5 0x00000000011f20d5 in fold_stmt_1 (gsi=0x7fffffffd410, > > inplace=false, valueize=0x18d3b10 <fwprop_ssa_val(tree)>, > > dce_worklist=0x7fffffffd4c0) at /gcc/master/gcc/gcc/gimple-fold.cc:6648 > > > > ctor is a STRING_CST tree and has the string in it : "J2OZF50FYL" > > > > The fold routine gets to : > > > > /* We found the field with exact match. */ > > if (type > > && useless_type_conversion_p (type, TREE_TYPE (ctor)) > > && known_eq (poly_offset, 0U)) > > return canonicalize_constructor_val (unshare_expr (ctor), from_decl); > > > > I would hazard a guess that it is the "unshare_expr (ctor)" that is > > causing the duplication of the string? I presume we have a good > > reason for doing this? Perhaps that is a bad thing at -Os? I don't > > relally remember all the unsharing details :-) > > > > From this point, the presumed duplication of the string is returned > > and there are no other gates before fold_stmt_1 calls > > gimple_assign_set_rhs_from_tree (gsi, new_rhs); > > with the newly copied and returned string. > > > > > > I guess an alternate line of questioning is why on x86 we do not turn > > the second functions call: > > > > memcpy (test, orig, 10 * sizeof (orig[0])); > > > > into > > > > MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM <unsigned > > char[10]> [(char * {ref-all})&orig]; > > > > like arm-none-eabi does. It seems that this lowering is triggering > > the fold and string duplication. > > > > Andrew > > > The difference in lowering between my x86 build and the arm build is in > gimple_fold_call: > > if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)) > { > if (gimple_fold_builtin (gsi)) > changed = true; > } > > on x86, it fails the gimple_call_builtin_p() call, whereas on arm it > does not, and proceeds to fold the builtin > > the decl for memcpy on my x86 box has a built_in_class of NOT_BUILT_IN, > whereas on the arm build, it is set to BUILT_IN_NORMAL, which then > proceeds to do the fold. > > Where is that determined? I don't see much in the config directories or > other obvious places as to where it is decided that it is a builtin > function or not? > Most likely this is due to last/size argument to memcpy. For x86_64-linux, it would be unsigned long. Thanks, Andrew > Andrew > >