Qing Zhao <qing.z...@oracle.com> writes: >>>> D will keep all initialized aggregates as aggregates and live which >>>> means stack will be allocated for it. With A the usual optimizations >>>> to reduce stack usage can be applied. >>> >>> I checked the routine “poverties::bump_map” in 511.povray_r since it >>> has a lot stack increase >>> due to implementation D, by examine the IR immediate before RTL >>> expansion phase. >>> (image.cpp.244t.optimized), I found that we have the following >>> additional statements for the array elements: >>> >>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >>> * normal) >>> { >>> … >>> double p3[3]; >>> double p2[3]; >>> double p1[3]; >>> float colour3[5]; >>> float colour2[5]; >>> float colour1[5]; >>> … >>> # DEBUG BEGIN_STMT >>> colour1 = .DEFERRED_INIT (colour1, 2); >>> colour2 = .DEFERRED_INIT (colour2, 2); >>> colour3 = .DEFERRED_INIT (colour3, 2); >>> # DEBUG BEGIN_STMT >>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); >>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); >>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); >>> p1 = .DEFERRED_INIT (p1, 2); >>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] >>> # DEBUG p1$0 => D#12 >>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] >>> # DEBUG p1$1 => D#11 >>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] >>> # DEBUG p1$2 => D#10 >>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); >>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); >>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); >>> p2 = .DEFERRED_INIT (p2, 2); >>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] >>> # DEBUG p2$0 => D#9 >>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] >>> # DEBUG p2$1 => D#8 >>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] >>> # DEBUG p2$2 => D#7 >>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); >>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); >>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); >>> p3 = .DEFERRED_INIT (p3, 2); >>> …. >>> } >>> >>> I guess that the above “MEM <double>….. = …” are the ones that make the >>> differences. Which phase introduced them? >> >> Looks like SRA. But you can just dump all and grep for the first occurrence. > > Yes, looks like that SRA is the one: > > image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); > image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); > image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D);
I realise no-one was suggesting otherwise, but FWIW: SRA could easily be extended to handle .DEFERRED_INIT if that's the main source of excess stack usage. A single .DEFERRED_INIT of an aggregate can be split into .DEFERRED_INITs of individual components. In other words, the investigation you're doing looks like the right way of deciding which passes are worth extending to handle .DEFERRED_INIT. Thanks, Richard