Hi, Richard, I have adjusted SRA phase to split calls to DEFERRED_INIT per you suggestion.
And now the routine “bump_map” in 511.povray is like following: ... # DEBUG BEGIN_STMT xcoor = 0.0; ycoor = 0.0; # DEBUG BEGIN_STMT index = .DEFERRED_INIT (index, 2); index2 = .DEFERRED_INIT (index2, 2); index3 = .DEFERRED_INIT (index3, 2); # DEBUG BEGIN_STMT colour1 = .DEFERRED_INIT (colour1, 2); colour2 = .DEFERRED_INIT (colour2, 2); colour3 = .DEFERRED_INIT (colour3, 2); # DEBUG BEGIN_STMT p1$0_181 = .DEFERRED_INIT (p1$0_195(D), 2); # DEBUG p1$0 => p1$0_181 p1$1_184 = .DEFERRED_INIT (p1$1_182(D), 2); # DEBUG p1$1 => p1$1_184 p1$2_172 = .DEFERRED_INIT (p1$2_185(D), 2); # DEBUG p1$2 => p1$2_172 p2$0_177 = .DEFERRED_INIT (p2$0_173(D), 2); # DEBUG p2$0 => p2$0_177 p2$1_135 = .DEFERRED_INIT (p2$1_178(D), 2); # DEBUG p2$1 => p2$1_135 p2$2_137 = .DEFERRED_INIT (p2$2_136(D), 2); # DEBUG p2$2 => p2$2_137 p3$0_377 = .DEFERRED_INIT (p3$0_376(D), 2); # DEBUG p3$0 => p3$0_377 p3$1_379 = .DEFERRED_INIT (p3$1_378(D), 2); # DEBUG p3$1 => p3$1_379 p3$2_381 = .DEFERRED_INIT (p3$2_380(D), 2); # DEBUG p3$2 => p3$2_381 In the above, p1, p2, and p3 are all splitted to calls to DEFERRED_INIT of the components of p1, p2 and p3. With this change, the stack usage numbers with -fstack-usage for approach A, old approach D and new D with the splitting in SRA are: Approach A Approach D-old Approach D-new 272 624 368 From the above, we can see that splitting the call to DEFERRED_INIT in SRA can reduce the stack usage increase dramatically. However, looks like that the stack size for D is still bigger than A. I checked the IR again, and found that the alias analysis might be responsible for this (by compare the image.cpp.026t.ealias for both A and D): (Due to the call to: colour1 = .DEFERRED_INIT (colour1, 2); ) ******Approach A: Points_to analysis: Constraints: … colour1 = &NULL … colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL ... callarg(53) = &colour1 ... _53 = colour1 Points_to sets: … colour1 = { NULL ESCAPED NONLOCAL } same as _53 ... CALLUSED(48) = { NULL ESCAPED NONLOCAL index colour1 } CALLCLOBBERED(49) = { NULL ESCAPED NONLOCAL index colour1 } same as CALLUSED(48) ... callarg(53) = { NULL ESCAPED NONLOCAL colour1 } ******Apprach D: Points_to analysis: Constraints: … callarg(19) = colour1 callarg(19) = &NONLOCAL colour1 = callarg(19) + UNKNOWN colour1 = &NONLOCAL … colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL colour1 = &NONLOCAL … callarg(74) = &colour1 callarg(74) = callarg(74) + UNKNOWN callarg(74) = *callarg(74) + UNKNOWN … _53 = colour1 _54 = _53 _55 = _54 + UNKNOWN _55 = &NONLOCAL _56 = colour1 _57 = _56 _58 = _57 + UNKNOWN _58 = &NONLOCAL _59 = _55 + UNKNOWN _59 = _58 + UNKNOWN _60 = colour1 _61 = _60 _62 = _61 + UNKNOWN _62 = &NONLOCAL _63 = _59 + UNKNOWN _63 = _62 + UNKNOWN _64 = _63 + UNKNOWN .. Points_to set: … colour1 = { ESCAPED NONLOCAL } same as callarg(19) … CALLUSED(69) = { ESCAPED NONLOCAL index colour1 } CALLCLOBBERED(70) = { ESCAPED NONLOCAL index colour1 } same as CALLUSED(69) callarg(71) = { ESCAPED NONLOCAL } callarg(72) = { ESCAPED NONLOCAL } callarg(73) = { ESCAPED NONLOCAL } callarg(74) = { ESCAPED NONLOCAL colour1 } My question: Is it possible to adjust alias analysis to resolve this issue? thanks. Qing > On Jan 18, 2021, at 10:12 AM, Qing Zhao via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > >>>>> I checked the routine “poverties::bump_map” in 511.povray_r since it >>>>> has a lot stack increase >>>>> due to implementation D, by examine the IR immediate before RTL >>>>> expansion phase. >>>>> (image.cpp.244t.optimized), I found that we have the following >>>>> additional statements for the array elements: >>>>> >>>>> void pov::bump_map (double * EPoint, struct TNORMAL * Tnormal, double >>>>> * normal) >>>>> { >>>>> … >>>>> double p3[3]; >>>>> double p2[3]; >>>>> double p1[3]; >>>>> float colour3[5]; >>>>> float colour2[5]; >>>>> float colour1[5]; >>>>> … >>>>> # DEBUG BEGIN_STMT >>>>> colour1 = .DEFERRED_INIT (colour1, 2); >>>>> colour2 = .DEFERRED_INIT (colour2, 2); >>>>> colour3 = .DEFERRED_INIT (colour3, 2); >>>>> # DEBUG BEGIN_STMT >>>>> MEM <double> [(double[3] *)&p1] = p1$0_144(D); >>>>> MEM <double> [(double[3] *)&p1 + 8B] = p1$1_135(D); >>>>> MEM <double> [(double[3] *)&p1 + 16B] = p1$2_138(D); >>>>> p1 = .DEFERRED_INIT (p1, 2); >>>>> # DEBUG D#12 => MEM <double> [(double[3] *)&p1] >>>>> # DEBUG p1$0 => D#12 >>>>> # DEBUG D#11 => MEM <double> [(double[3] *)&p1 + 8B] >>>>> # DEBUG p1$1 => D#11 >>>>> # DEBUG D#10 => MEM <double> [(double[3] *)&p1 + 16B] >>>>> # DEBUG p1$2 => D#10 >>>>> MEM <double> [(double[3] *)&p2] = p2$0_109(D); >>>>> MEM <double> [(double[3] *)&p2 + 8B] = p2$1_111(D); >>>>> MEM <double> [(double[3] *)&p2 + 16B] = p2$2_254(D); >>>>> p2 = .DEFERRED_INIT (p2, 2); >>>>> # DEBUG D#9 => MEM <double> [(double[3] *)&p2] >>>>> # DEBUG p2$0 => D#9 >>>>> # DEBUG D#8 => MEM <double> [(double[3] *)&p2 + 8B] >>>>> # DEBUG p2$1 => D#8 >>>>> # DEBUG D#7 => MEM <double> [(double[3] *)&p2 + 16B] >>>>> # DEBUG p2$2 => D#7 >>>>> MEM <double> [(double[3] *)&p3] = p3$0_256(D); >>>>> MEM <double> [(double[3] *)&p3 + 8B] = p3$1_258(D); >>>>> MEM <double> [(double[3] *)&p3 + 16B] = p3$2_260(D); >>>>> p3 = .DEFERRED_INIT (p3, 2); >>>>> …. >>>>> } >>>>> >>>>> I guess that the above “MEM <double>….. = …” are the ones that make the >>>>> differences. Which phase introduced them? >>>> >>>> Looks like SRA. But you can just dump all and grep for the first >>>> occurrence. >>> >>> Yes, looks like that SRA is the one: >>> >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1] = p1$0_195(D); >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 8B] = p1$1_182(D); >>> image.cpp.035t.esra: MEM <double> [(double[3] *)&p1 + 16B] = p1$2_185(D); >> >> I realise no-one was suggesting otherwise, but FWIW: SRA could easily >> be extended to handle .DEFERRED_INIT if that's the main source of >> excess stack usage. A single .DEFERRED_INIT of an aggregate can >> be split into .DEFERRED_INITs of individual components. > > Thanks a lot for the suggestion, > I will study the code of SRA to see how to do this and then see whether this > can resolve the issue.