Hi, Richard Biener <richard.guent...@gmail.com> writes:
> On Mon, Oct 31, 2022 at 11:14 PM Jeff Law via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: >> >> >> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote: >> > Hi, >> > >> > We know that for struct variable assignment, memory copy may be used. >> > And for memcpy, we may load and store more bytes as possible at one time. >> > While it may be not best here: >> > 1. Before/after stuct variable assignment, the vaiable may be operated. >> > And it is hard for some optimizations to leap over memcpy. Then some >> > struct >> > operations may be sub-optimimal. Like the issue in PR65421. >> > 2. The size of struct is constant mostly, the memcpy would be expanded. >> > Using >> > small size to load/store and executing in parallel may not slower than >> > using >> > large size to loat/store. (sure, more registers may be used for smaller >> > bytes.) >> > >> > >> > In PR65421, For source code as below: >> > ////////t.c >> > #define FN 4 >> > typedef struct { double a[FN]; } A; >> > >> > A foo (const A *a) { return *a; } >> > A bar (const A a) { return a; } >> >> So the first question in my mind is can we do better at the gimple >> phase? For the second case in particular can't we just "return a" >> rather than copying a into <retval> then returning <retval>? This feels >> a lot like the return value optimization from C++. I'm not sure if it >> applies to the first case or not, it's been a long time since I looked >> at NRV optimizations, but it might be worth poking around in there a bit >> (tree-nrv.cc). >> >> >> But even so, these kinds of things are still bound to happen, so it's >> probably worth thinking about if we can do better in RTL as well. >> >> >> The first thing that comes to my mind is to annotate memcpy calls that >> are structure assignments. The idea here is that we may want to expand >> a memcpy differently in those cases. Changing how we expand an opaque >> memcpy call is unlikely to be beneficial in most cases. But changing >> how we expand a structure copy may be beneficial by exposing the >> underlying field values. This would roughly correspond to your method #1. >> >> Or instead of changing how we expand, teach the optimizers about these >> annotated memcpy calls -- they're just a a copy of each field. That's >> how CSE and the propagators could treat them. After some point we'd >> lower them in the usual ways, but at least early in the RTL pipeline we >> could keep them as annotated memcpy calls. This roughly corresponds to >> your second suggestion. > > In the end it depends on the access patterns so some analysis like SRA > performs would be nice. The issue with expanding memcpy on GIMPLE > is that we currently cannot express 'rep; movb;' or other target specific > sequences from the cpymem like optabs on GIMPLE and recovering those > from piecewise copies on RTL is going to be difficult. Actually, it is a special memcpy. It is generated during expanding the struct assignment(expand_assignment/store_expr/emit_block_move). We may introduce a function block_move_for_record for struct type. And this function could be a target hook to generate specificed sequences. For example: r125:DF=[r112:DI+0x20] r126:DF=[r112:DI+0x28] [r112:DI]=r125:DF [r112:DI+0x8]=r126:DF After expanding, following passes(cse/prop/dse/..) could optimize the insn sequences. e.g "[r112:DI+0x20]=f1;r125:DF=[r112:DI+0x20]; [r112:DI]=r125:DF;r129:DF=[r112:DI]" ==> "r129:DF=f1" And if the small reading/writing insns are still occur in late passes e.g. combine, we would recover the isnsn to better sequence: r125:DF=[r112:DI+0x20];r126:DF=[r112:DI+0x28] ==> r155:V2DI=[r112:DI+0x20]; Any comments? Thanks! BR, Jeff(Jiufu) > >> >> jeff >> >> >>