Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

Richard Sandiford via Gcc-patches Mon, 18 Jan 2021 05:09:12 -0800

Qing Zhao <qing.z...@oracle.com> writes:
>>>> D will keep all initialized aggregates as aggregates and live which
>>>> means stack will be allocated for it.  With A the usual optimizations
>>>> to reduce stack usage can be applied.
>>> 
>>> I checked the routine “poverties::bump_map” in 511.povray_r since it
>>> has a lot stack increase 
>>> due to implementation D, by examine the IR immediate before RTL
>>> expansion phase.  
>>> (image.cpp.244t.optimized), I found that we have the following
>>> additional statements for the array elements:
>>> 
>>> void  pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double
>>> * normal)
>>> {
>>> …
>>> double p3[3];
>>> double p2[3];
>>> double p1[3];
>>> float colour3[5];
>>> float colour2[5];
>>> float colour1[5];
>>> …
>>>  # DEBUG BEGIN_STMT
>>> colour1 = .DEFERRED_INIT (colour1, 2);
>>> colour2 = .DEFERRED_INIT (colour2, 2);
>>> colour3 = .DEFERRED_INIT (colour3, 2);
>>> # DEBUG BEGIN_STMT
>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D);
>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D);
>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D);
>>> p1 = .DEFERRED_INIT (p1, 2);
>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1]
>>> # DEBUG p1$0 => D#12
>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B]
>>> # DEBUG p1$1 => D#11
>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B]
>>> # DEBUG p1$2 => D#10
>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D);
>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D);
>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D);
>>> p2 = .DEFERRED_INIT (p2, 2);
>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2]
>>> # DEBUG p2$0 => D#9
>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B]
>>> # DEBUG p2$1 => D#8
>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B]
>>> # DEBUG p2$2 => D#7
>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D);
>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D);
>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D);
>>> p3 = .DEFERRED_INIT (p3, 2);
>>> ….
>>> }
>>> 
>>> I guess that the above “MEM <double>….. = …” are the ones that make the
>>> differences. Which phase introduced them?
>> 
>> Looks like SRA. But you can just dump all and grep for the first occurrence. 
>
> Yes, looks like that SRA is the one:
>
> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1] = p1$0_195(D);
> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D);
> image.cpp.035t.esra:  MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);


I realise no-one was suggesting otherwise, but FWIW: SRA could easily
be extended to handle .DEFERRED_INIT if that's the main source of
excess stack usage.  A single .DEFERRED_INIT of an aggregate can
be split into .DEFERRED_INITs of individual components.

In other words, the investigation you're doing looks like the right way
of deciding which passes are worth extending to handle .DEFERRED_INIT.

Thanks,
Richard

Re: The performance data for two different implementation of new security feature -ftrivial-auto-var-init

Reply via email to