Hi, On Mon, Apr 17, 2017 at 09:35:18PM +0200, Martin Jambor wrote: > On Thu, Apr 13, 2017 at 08:48:38PM +0200, Richard Biener wrote: > > On April 13, 2017 7:41:29 PM GMT+02:00, Martin Jambor <mjam...@suse.cz> > > wrote:
... > > >Therefore, I have adapted the patch to allow char arrays for const > > >decls only and verified that it scalarizes a const-pool array of chars > > >on Aarch64. The (otherwise yet untested) result is below. > > > > > >What do you think? > > > > Why special case char arrays? I'd simply disallow total scalarization of > > non-const arrays completely. > > Well, currently we will get element-wise copy propagation for things > like "int rgb[3];" (possibly embeded in a small struct). If I remove > it, someone will complain. Maybe the correct limit is SI mode size or > BITS_PER_WORD/2 (so that int arrays qualify on x86_64-linux), though. > I just wanted to be conservative at this point in the release cycle. > To gather some data for this, I have looked at SRA statistics gathered for SPEC 2006 benchmarks. Complete tables are below, let me comment on the diffs, though. The most interesting column is probably the third one (the second one with numbers). This is the difference between my patch that special-cases char arrays and disallowing any non-const-pool array total scalarization: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| -| 403.gcc - esra | 138 | 12 | 458 | 1274 | 20 | -| 403.gcc - sra | 42 | 11 | 141 | 457 | 13 | +| 403.gcc - esra | 132 | 4 | 370 | 1274 | 13 | +| 403.gcc - sra | 43 | 8 | 140 | 459 | 12 | -| 447.dealII - esra | 12899 | 7826 | 16347 | 29046 | 1840 | -| 447.dealII - sra | 1083 | 567 | 2298 | 5479 | 392 | +| 447.dealII - esra | 12897 | 7790 | 16329 | 29046 | 1838 | +| 447.dealII - sra | 1079 | 553 | 2262 | 5479 | 388 | -| 450.soplex - sra | 42 | 2 | 87 | 263 | 2 | +| 450.soplex - sra | 42 | 0 | 81 | 263 | 2 | -| 453.povray - esra | 265 | 10 | 812 | 3422 | 12 | -| 453.povray - sra | 76 | 6 | 217 | 786 | 57 | +| 453.povray - esra | 264 | 8 | 806 | 3422 | 11 | +| 453.povray - sra | 76 | 5 | 215 | 786 | 57 | -| 465.tonto - esra | 4029 | 5 | 8269 | 21351 | 5 | +| 465.tonto - esra | 4024 | 0 | 8233 | 21351 | 0 | For reference, this is the difference between (2 week old) trunk and my proposed patch: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| -| 401.bzip2 - sra | 3 | 2 | 22 | 62 | 0 | +| 401.bzip2 - sra | 3 | 0 | 22 | 62 | 0 | -| 403.gcc - esra | 138 | 13 | 458 | 1274 | 20 | -| 403.gcc - sra | 42 | 12 | 141 | 457 | 13 | +| 403.gcc - esra | 138 | 12 | 458 | 1274 | 20 | +| 403.gcc - sra | 42 | 11 | 141 | 457 | 13 | -| 447.dealII - esra | 12899 | 7830 | 16387 | 29046 | 1840 | +| 447.dealII - esra | 12899 | 7826 | 16347 | 29046 | 1840 | -| 453.povray - esra | 266 | 11 | 822 | 3422 | 13 | +| 453.povray - esra | 265 | 10 | 812 | 3422 | 12 | -| 454.calculix - sra | 17 | 2 | 81 | 305 | 5 | +| 454.calculix - sra | 15 | 0 | 71 | 305 | 0 | -| 465.tonto - sra | 98 | 36 | 4652 | 374 | 37 | +| 465.tonto - sra | 62 | 0 | 166 | 374 | 0 | -| 483.xalancbmk - esra | 17844 | 10648 | 19824 | 39699 | 1404 | +| 483.xalancbmk - esra | 17844 | 10642 | 19764 | 39699 | 1404 | The following are the non-char non-const-pool arrays we totally scalarize now in individual benchmarks with some notes: GCC - struct output_state in diagnostic.c contains "int diagnostic_count[6]". I briefly inspected of a few affected functions, in all we did perform copy propagation and got rid of a useless aggregate intermediary. But that copy propagation was done integer-wise which is only 32bit. - ereal_negate in real.c has local "short unsigned int[6]" this actually looks harmful and that the size limit should not allow shorts either. ereal_ldexp looks like having the same thing (but is much longer). - struct realvaluetype in in build_real_from_int_cst in has int[3] in it, resulting in a copy propagation with PHI nodes. dealII - struct TableIndices is basically encapsulated int[2]. Total scalarization facilitates constant propagation of zeros quite a few times. I have not checked the optimized dump to see if ti matters though. - struct _ValueType is int[8[ followed by a single char field. Scalarization also only facilitates propagation of a zero initializer... using 32bit stores. struct CellData is the same thing. soplex - UnitVector of four pointers. Total scalarization does not change what normal SRA does. povray - struct BCYL_INT has two double[2] arrays. Facilitates quite a lot of zero initializer propagation of the first array and propagation of undefinedness of the second array. - struct BBOX is similar in structure but total scalarization is probably unnecessary. - TempPixelD.26340 contains float[5] and seems to facilitate nice copy propagation. tonto - struct array1_integer has a one element array of three long integers called stride, lbound and ubound. Total scalarization facilitates simple copy propagation of the form local = *p1; *p2 = local; - ditto for struct array2_real which has descriptors of two dimensions. - only 5 cases overall but in ipa-sra so they might be inlined to lots of places. So, I still think we should keep total scalarization of arrays with WORD-sized elements and perhaps of int-sized elements too. Martin All data: No patch: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| | 400.perlbench - esra | 7 | 4 | 54 | 104 | 4 | | 400.perlbench - sra | 0 | 0 | 0 | 0 | 0 | | 401.bzip2 - esra | 3 | 0 | 9 | 66 | 0 | | 401.bzip2 - sra | 3 | 2 | 22 | 62 | 0 | | 403.gcc - esra | 138 | 13 | 458 | 1274 | 20 | | 403.gcc - sra | 42 | 12 | 141 | 457 | 13 | | 410.bwaves - esra | 0 | 0 | 0 | 0 | 0 | | 410.bwaves - sra | 0 | 0 | 0 | 0 | 0 | | 416.gamess - esra | 48 | 0 | 246 | 1250 | 0 | | 416.gamess - sra | 46 | 0 | 498 | 2208 | 0 | | 429.mcf - esra | 0 | 0 | 0 | 0 | 0 | | 429.mcf - sra | 0 | 0 | 0 | 0 | 0 | | 433.milc - esra | 18 | 3 | 33 | 103 | 4 | | 433.milc - sra | 6 | 0 | 18 | 87 | 0 | | 434.zeusmp - esra | 3 | 0 | 7 | 87 | 0 | | 434.zeusmp - sra | 0 | 0 | 0 | 0 | 0 | | 435.gromacs - esra | 150 | 22 | 378 | 1048 | 23 | | 435.gromacs - sra | 48 | 0 | 88 | 458 | 0 | | 436.cactusADM - esra | 58 | 30 | 117 | 538 | 30 | | 436.cactusADM - sra | 0 | 0 | 0 | 0 | 0 | | 437.leslie3d - esra | 91 | 0 | 412 | 2370 | 0 | | 437.leslie3d - sra | 1 | 0 | 1 | 5 | 0 | | 444.namd - esra | 175 | 11 | 182 | 676 | 11 | | 444.namd - sra | 4 | 4 | 12 | 0 | 4 | | 445.gobmk - esra | 5 | 1 | 10 | 33 | 1 | | 445.gobmk - sra | 1 | 0 | 3 | 13 | 0 | | 447.dealII - esra | 12899 | 7830 | 16387 | 29046 | 1840 | | 447.dealII - sra | 1083 | 567 | 2298 | 5479 | 392 | | 450.soplex - esra | 64 | 34 | 68 | 128 | 6 | | 450.soplex - sra | 42 | 2 | 87 | 263 | 2 | | 453.povray - esra | 266 | 11 | 822 | 3422 | 13 | | 453.povray - sra | 76 | 6 | 217 | 786 | 57 | | 454.calculix - esra | 19 | 0 | 126 | 836 | 0 | | 454.calculix - sra | 17 | 2 | 81 | 305 | 5 | | 456.hmmer - esra | 0 | 0 | 0 | 0 | 0 | | 456.hmmer - sra | 4 | 0 | 5 | 13 | 0 | | 458.sjeng - esra | 7 | 6 | 38 | 3 | 12 | | 458.sjeng - sra | 3 | 0 | 15 | 33 | 2 | | 459.GemsFDTD - esra | 217 | 1 | 484 | 1805 | 1 | | 459.GemsFDTD - sra | 0 | 0 | 0 | 0 | 0 | | 462.libquantum - esra | 17 | 6 | 44 | 126 | 6 | | 462.libquantum - sra | 6 | 0 | 6 | 11 | 0 | | 464.h264ref - esra | 16 | 0 | 92 | 393 | 0 | | 464.h264ref - sra | 6 | 0 | 12 | 84 | 0 | | 465.tonto - esra | 4029 | 5 | 8269 | 21351 | 5 | | 465.tonto - sra | 98 | 36 | 4652 | 374 | 37 | | 470.lbm - esra | 0 | 0 | 0 | 0 | 0 | | 470.lbm - sra | 0 | 0 | 0 | 0 | 0 | | 471.omnetpp - esra | 208 | 152 | 232 | 560 | 24 | | 471.omnetpp - sra | 34 | 17 | 49 | 200 | 12 | | 473.astar - esra | 4 | 3 | 12 | 25 | 3 | | 473.astar - sra | 24 | 19 | 68 | 118 | 19 | | 481.wrf - esra | 2113 | 11 | 2560 | 5778 | 0 | | 481.wrf - sra | 16 | 1 | 40 | 57 | 1 | | 482.sphinx3 - esra | 0 | 0 | 0 | 0 | 0 | | 482.sphinx3 - sra | 0 | 0 | 0 | 0 | 0 | | 483.xalancbmk - esra | 17844 | 10648 | 19824 | 39699 | 1404 | | 483.xalancbmk - sra | 862 | 426 | 1634 | 3814 | 249 | With my patch: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| | 400.perlbench - esra | 7 | 4 | 54 | 104 | 4 | | 400.perlbench - sra | 0 | 0 | 0 | 0 | 0 | | 401.bzip2 - esra | 3 | 0 | 9 | 66 | 0 | | 401.bzip2 - sra | 3 | 0 | 22 | 62 | 0 | | 403.gcc - esra | 138 | 12 | 458 | 1274 | 20 | | 403.gcc - sra | 42 | 11 | 141 | 457 | 13 | | 410.bwaves - esra | 0 | 0 | 0 | 0 | 0 | | 410.bwaves - sra | 0 | 0 | 0 | 0 | 0 | | 416.gamess - esra | 48 | 0 | 246 | 1250 | 0 | | 416.gamess - sra | 46 | 0 | 498 | 2208 | 0 | | 429.mcf - esra | 0 | 0 | 0 | 0 | 0 | | 429.mcf - sra | 0 | 0 | 0 | 0 | 0 | | 433.milc - esra | 18 | 3 | 33 | 103 | 4 | | 433.milc - sra | 6 | 0 | 18 | 87 | 0 | | 434.zeusmp - esra | 3 | 0 | 7 | 87 | 0 | | 434.zeusmp - sra | 0 | 0 | 0 | 0 | 0 | | 435.gromacs - esra | 150 | 22 | 378 | 1048 | 23 | | 435.gromacs - sra | 48 | 0 | 88 | 458 | 0 | | 436.cactusADM - esra | 58 | 30 | 117 | 538 | 30 | | 436.cactusADM - sra | 0 | 0 | 0 | 0 | 0 | | 437.leslie3d - esra | 91 | 0 | 412 | 2370 | 0 | | 437.leslie3d - sra | 1 | 0 | 1 | 5 | 0 | | 444.namd - esra | 175 | 11 | 182 | 676 | 11 | | 444.namd - sra | 4 | 4 | 12 | 0 | 4 | | 445.gobmk - esra | 5 | 1 | 10 | 33 | 1 | | 445.gobmk - sra | 1 | 0 | 3 | 13 | 0 | | 447.dealII - esra | 12899 | 7826 | 16347 | 29046 | 1840 | | 447.dealII - sra | 1083 | 567 | 2298 | 5479 | 392 | | 450.soplex - esra | 64 | 34 | 68 | 128 | 6 | | 450.soplex - sra | 42 | 2 | 87 | 263 | 2 | | 453.povray - esra | 265 | 10 | 812 | 3422 | 12 | | 453.povray - sra | 76 | 6 | 217 | 786 | 57 | | 454.calculix - esra | 19 | 0 | 126 | 836 | 0 | | 454.calculix - sra | 15 | 0 | 71 | 305 | 0 | | 456.hmmer - esra | 0 | 0 | 0 | 0 | 0 | | 456.hmmer - sra | 4 | 0 | 5 | 13 | 0 | | 458.sjeng - esra | 7 | 6 | 38 | 3 | 12 | | 458.sjeng - sra | 3 | 0 | 15 | 33 | 2 | | 459.GemsFDTD - esra | 217 | 1 | 484 | 1805 | 1 | | 459.GemsFDTD - sra | 0 | 0 | 0 | 0 | 0 | | 462.libquantum - esra | 17 | 6 | 44 | 126 | 6 | | 462.libquantum - sra | 6 | 0 | 6 | 11 | 0 | | 464.h264ref - esra | 16 | 0 | 92 | 393 | 0 | | 464.h264ref - sra | 6 | 0 | 12 | 84 | 0 | | 465.tonto - esra | 4029 | 5 | 8269 | 21351 | 5 | | 465.tonto - sra | 62 | 0 | 166 | 374 | 0 | | 470.lbm - esra | 0 | 0 | 0 | 0 | 0 | | 470.lbm - sra | 0 | 0 | 0 | 0 | 0 | | 471.omnetpp - esra | 208 | 152 | 232 | 560 | 24 | | 471.omnetpp - sra | 34 | 17 | 49 | 200 | 12 | | 473.astar - esra | 4 | 3 | 12 | 25 | 3 | | 473.astar - sra | 24 | 19 | 68 | 118 | 19 | | 481.wrf - esra | 2113 | 11 | 2560 | 5778 | 0 | | 481.wrf - sra | 16 | 1 | 40 | 57 | 1 | | 482.sphinx3 - esra | 0 | 0 | 0 | 0 | 0 | | 482.sphinx3 - sra | 0 | 0 | 0 | 0 | 0 | | 483.xalancbmk - esra | 17844 | 10642 | 19764 | 39699 | 1404 | | 483.xalancbmk - sra | 862 | 426 | 1634 | 3814 | 249 | Disallowing any non-constant-pool arrays: | Benchmark - pass | scalarized aggregates | Totally scalarized agregates | # scalar replacements | Modified exprs | Deleted stmts | |-----------------------+-----------------------+------------------------------+-----------------------+----------------+---------------| | 400.perlbench - esra | 7 | 4 | 54 | 104 | 4 | | 400.perlbench - sra | 0 | 0 | 0 | 0 | 0 | | 401.bzip2 - esra | 3 | 0 | 9 | 66 | 0 | | 401.bzip2 - sra | 3 | 0 | 22 | 62 | 0 | | 403.gcc - esra | 132 | 4 | 370 | 1274 | 13 | | 403.gcc - sra | 43 | 8 | 140 | 459 | 12 | | 410.bwaves - esra | 0 | 0 | 0 | 0 | 0 | | 410.bwaves - sra | 0 | 0 | 0 | 0 | 0 | | 416.gamess - esra | 48 | 0 | 246 | 1250 | 0 | | 416.gamess - sra | 46 | 0 | 498 | 2208 | 0 | | 429.mcf - esra | 0 | 0 | 0 | 0 | 0 | | 429.mcf - sra | 0 | 0 | 0 | 0 | 0 | | 433.milc - esra | 18 | 3 | 33 | 103 | 4 | | 433.milc - sra | 6 | 0 | 18 | 87 | 0 | | 434.zeusmp - esra | 3 | 0 | 7 | 87 | 0 | | 434.zeusmp - sra | 0 | 0 | 0 | 0 | 0 | | 435.gromacs - esra | 150 | 22 | 378 | 1048 | 23 | | 435.gromacs - sra | 48 | 0 | 88 | 458 | 0 | | 436.cactusADM - esra | 58 | 30 | 117 | 538 | 30 | | 436.cactusADM - sra | 0 | 0 | 0 | 0 | 0 | | 437.leslie3d - esra | 91 | 0 | 412 | 2370 | 0 | | 437.leslie3d - sra | 1 | 0 | 1 | 5 | 0 | | 444.namd - esra | 175 | 11 | 182 | 676 | 11 | | 444.namd - sra | 4 | 4 | 12 | 0 | 4 | | 445.gobmk - esra | 5 | 1 | 10 | 33 | 1 | | 445.gobmk - sra | 1 | 0 | 3 | 13 | 0 | | 447.dealII - esra | 12897 | 7790 | 16329 | 29046 | 1838 | | 447.dealII - sra | 1079 | 553 | 2262 | 5479 | 388 | | 450.soplex - esra | 64 | 34 | 68 | 128 | 6 | | 450.soplex - sra | 42 | 0 | 81 | 263 | 2 | | 453.povray - esra | 264 | 8 | 806 | 3422 | 11 | | 453.povray - sra | 76 | 5 | 215 | 786 | 57 | | 454.calculix - esra | 19 | 0 | 126 | 836 | 0 | | 454.calculix - sra | 15 | 0 | 71 | 305 | 0 | | 456.hmmer - esra | 0 | 0 | 0 | 0 | 0 | | 456.hmmer - sra | 4 | 0 | 5 | 13 | 0 | | 458.sjeng - esra | 7 | 6 | 38 | 3 | 12 | | 458.sjeng - sra | 3 | 0 | 15 | 33 | 2 | | 459.GemsFDTD - esra | 217 | 1 | 484 | 1805 | 1 | | 459.GemsFDTD - sra | 0 | 0 | 0 | 0 | 0 | | 462.libquantum - esra | 17 | 6 | 44 | 126 | 6 | | 462.libquantum - sra | 6 | 0 | 6 | 11 | 0 | | 464.h264ref - esra | 16 | 0 | 92 | 393 | 0 | | 464.h264ref - sra | 6 | 0 | 12 | 84 | 0 | | 465.tonto - esra | 4024 | 0 | 8233 | 21351 | 0 | | 465.tonto - sra | 62 | 0 | 166 | 374 | 0 | | 470.lbm - esra | 0 | 0 | 0 | 0 | 0 | | 470.lbm - sra | 0 | 0 | 0 | 0 | 0 | | 471.omnetpp - esra | 208 | 152 | 232 | 560 | 24 | | 471.omnetpp - sra | 34 | 17 | 49 | 200 | 12 | | 473.astar - esra | 4 | 3 | 12 | 25 | 3 | | 473.astar - sra | 24 | 19 | 68 | 118 | 19 | | 481.wrf - esra | 2113 | 11 | 2560 | 5778 | 0 | | 481.wrf - sra | 16 | 1 | 40 | 57 | 1 | | 482.sphinx3 - esra | 0 | 0 | 0 | 0 | 0 | | 482.sphinx3 - sra | 0 | 0 | 0 | 0 | 0 | | 483.xalancbmk - esra | 17844 | 10642 | 19764 | 39699 | 1404 | | 483.xalancbmk - sra | 862 | 426 | 1634 | 3814 | 249 |