Jeff Law <jeffreya...@gmail.com> writes:

> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote:
>> Hi,
>>
>> We know that for struct variable assignment, memory copy may be used.
>> And for memcpy, we may load and store more bytes as possible at one time.
>> While it may be not best here:
>> 1. Before/after stuct variable assignment, the vaiable may be operated.
>> And it is hard for some optimizations to leap over memcpy.  Then some struct
>> operations may be sub-optimimal.  Like the issue in PR65421.
>> 2. The size of struct is constant mostly, the memcpy would be expanded.  
>> Using
>> small size to load/store and executing in parallel may not slower than using
>> large size to loat/store. (sure, more registers may be used for smaller 
>> bytes.)
>>
>>
>> In PR65421, For source code as below:
>> ////////t.c
>> #define FN 4
>> typedef struct { double a[FN]; } A;
>>
>> A foo (const A *a) { return *a; }
>> A bar (const A a) { return a; }
>
> So the first question in my mind is can we do better at the gimple
> phase?  For the second case in particular can't we just "return a"
> rather than copying a into <retval> then returning <retval>?  This
> feels a lot like the return value optimization from C++.  I'm not sure
> if it applies to the first case or not, it's been a long time since I
> looked at NRV optimizations, but it might be worth poking around in
> there a bit (tree-nrv.cc).
Thanks for point out this idea!!

Currently the optimized gimple looks like:
  D.3334 = a;
  return D.3334;

and
  D.3336 = *a_2(D);
  return D.3336;

It may be better to have:
"return a;" and "return *a;"
-----------------

If the code looks like:
typedef struct { double a[3]; long l;} A; //mix types
A foo (const A a) { return a; }
A bar (const A *a) { return *a; }

Current optimized gimples looks like:
  <retval> = a;
  return <retval>;
and
  <retval> = *a_2(D);
  return <retval>;

"return a;" and "return *a;" may be works here too.
>
>
> But even so, these kinds of things are still bound to happen, so it's
> probably worth thinking about if we can do better in RTL as well. 
>
Yeap, thanks!
>
> The first thing that comes to my mind is to annotate memcpy calls that
> are structure assignments.  The idea here is that we may want to
> expand a memcpy differently in those cases.   Changing how we expand
> an opaque memcpy call is unlikely to be beneficial in most cases.  But
> changing how we expand a structure copy may be beneficial by exposing
> the underlying field values.   This would roughly correspond to your
> method #1.
Right.  For general memcpy, we would read/write larger bytes at one
time. Reading/writing small fields may only beneficial for structure
assignment.

>
> Or instead of changing how we expand, teach the optimizers about these
> annotated memcpy calls -- they're just a a copy of each field.  
> That's how CSE and the propagators could treat them. After some point
> we'd lower them in the usual ways, but at least early in the RTL
> pipeline we could keep them as annotated memcpy calls.  This roughly
> corresponds to your second suggestion.
Thanks for your insights about this idea! Using annoated memcpy for
early optimizations, and it would be treated as general memcpy in later
passes.


Thanks again for your very helpful comments and sugguestions!

BR,
Jeff(Jiufu)

>
>
> jeff

Reply via email to