On Fri, Nov 3, 2017 at 5:38 PM, Martin Jambor <mjam...@suse.cz> wrote: > Hi, > > On Thu, Oct 26, 2017 at 02:43:02PM +0200, Richard Biener wrote: >> On Thu, Oct 26, 2017 at 2:18 PM, Martin Jambor <mjam...@suse.cz> wrote: >> > >> > Nevertheless, I still intend to experiment with the limit, I sent out >> > this RFC exactly so that I don't spend a lot of time benchmarking >> > something that is eventually not deemed acceptable on principle. >> >> I think the limit should be on the number of generated copies and not >> the overall size of the structure... If the struct were composed of >> 32 individual chars we wouldn't want to emit 32 loads and 32 stores... > > I have added another parameter to also limit the number of generated > element copies. I have kept the size limit so that we don't even > attempt to count them for large structures. > >> Given that load bandwith is usually higher than store bandwith it >> might make sense to do the store combining in our copying sequence, >> like for the 8 byte entry case use sth like >> >> movq 0(%eax), %xmm0 >> movhps 8(%eax), %xmm0 // or vpinsert >> mov[au]ps %xmm0, 0%(ebx) > > I would be concerned about the cost of GPR->XMM moves when the value > being stored is in a GPR, especially with generic tuning which (with > -O2) is the main thing I am targeting here. Wouldn't we actually pass > it through stack with all the associated penalties? > > Also, while such store combining might work for ImageMagick, if a > programmer did: > > region1->x = x1; > region2->x = x2; > region1->y = 0; > region2->y = 20; > ... > SetPixelCacheNexusPixels(cache_info, ReadMode, region1, ...) > > The transformation would not work unless it could prove region1 and > region2 are not the same thing. > >> As said a general concern was you not copying padding. If you >> put this into an even more common place you surely will break >> stuff, no? > > I don't understand, what even more common place do you mean? > > I have been testing the patch also on a bunch of other architectures > and those have tests in their testsuite that check that padding is > copied, for example some tests in gcc.target/aarch64/aapcs64/ check > whether a structure passed to a function is binary the same as the > original, and the test fail because of padding. That is the only > "breakage" I know about but I believe that the assumption that padding > must always be is wrong (if it is not than we need to make SRA quite a > bit more conservative).
The main concern here is that GIMPLE is not very well defined for aggregate copies and that gimple-fold.c happily optimizes memcpy (&a, &b, sizeof (a)) into a = b; struct A { short s; long i; long j; }; struct A a, b; void foo () { __builtin_memcpy (&a, &b, sizeof (struct A)); } gets folded to MEM[(char * {ref-all})&a] = MEM[(char * {ref-all})&b]; return; you see we're careful about TBAA but (don't see that above but can be verified by for example debugging expand_assignment) TREE_TYPE (MEM[...]) is actually 'struct A'. And yes, I've been worried about SRA as well here... it _does_ have some early outs when seeing VIEW_CONVERT_EXPR but appearantly not for the above. Testcase that aborts with SRA but not without: struct A { short s; long i; long j; }; struct A a, b; void foo () { struct A c; __builtin_memcpy (&c, &b, sizeof (struct A)); __builtin_memcpy (&a, &c, sizeof (struct A)); } int main() { __builtin_memset (&b, 0, sizeof (struct A)); b.s = 1; __builtin_memcpy ((char *)&b+2, &b, 2); foo (); __builtin_memcpy (&a, (char *)&a+2, 2); if (a.s != 1) __builtin_abort (); return 0; } > On Thu, Oct 26, 2017 at 05:09:42PM +0200, Richard Biener wrote: >> Also if we do the stores in smaller chunks we are more >> likely hitting the same store-to-load-forwarding issue >> elsewhere. Like in case the destination is memcpy'ed >> away. >> >> So the proposed change isn't necessarily a win without >> a possible similar regression that it tries to fix. >> > > With some encouragement by Honza, I have done some benchmarking anyway > and I did not see anything of that kind. The regression would be visible when the aggregate copy is followed by SLP vectorized code for example. Then we'd get a vector load from say v4si mode but had earlier 4 SImode stores -> STLF issue again. The copying via xmm registers would have made a perfect forwarding possibility. I'm not saying you'll hit this in SPEC but just it's easy to construct a case that didn't have a STLF issue but after the "fix" has. So the fix is to _not_ split the stores but only the loads ... unless you can do sophisticated analysis of the context. That said, splitting the loads is fine if the CPU can handle enough loads in flight, etc., but splitting stores is dangerous (and CPU resources on the store side are usually more limited). >> Whole-program analysis of accesses might allow >> marking affected objects. > > Attempting to save access patterns before IPA and then tracking them > and keep them in sync across inlining and all gimple late passes seems > like a nightmarish task. If this approach is indeed rejected I might > attempt to do the store combining but a WPA analysis seems just too > complex. Ok. > Anyway, here are the numbers. They were taken on two different > Zen-based machines. I am also in the process of measuring at least > something on a Haswell machine but I started later and the machine is > quite a bit slower so I will not have the numbers until next week (and > not all equivalents in any way). I found out I do not have access to > any more modern .*Lake intel CPU. > > trunk is pristine trunk revision 254205. All benchmarks were run > three times and the median was chosen. > > s or strict means the patch with the strictest possible settings to > speed-up ImageMagick, i.e. --param max-size-for-elementwise-copy=32 > --param max-insns-for-elementwise-copy=4. Also run three times. > > x1 is patched trunk with the parameters having the default values was > going to propose, i.e. --param max-size-for-elementwise-copy=35 > --param max-insns-for-elementwise-copy=6. Also run three times. > > I then increased the parameter, in search for further missed > opportunities and to see what and how soon will start to regress. > x2 is roughly twice that, --param max-size-for-elementwise-copy=67 > --param max-insns-for-elementwise-copy=12. Run twice, outliers > manually checked. > > x4 is roughly four times x1, namely --param max-size-for-elementwise-copy=143 > --param max-insns-for-elementwise-copy=24. Run only once. > > The times below are of course "non-reportable," for a whole bunch of > reasons. > > > Zen SPECINT 2006 -O2 generic tuning > ==================================== > > Run-time > -------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 400.perlbench | 237 | 236 | -0.42 | 236 | -0.42 | 238 | +0.42 | 237 | > +0.00 | > | 401.bzip2 | 341 | 342 | +0.29 | 341 | +0.00 | 341 | +0.00 | 341 | > +0.00 | > | 403.gcc | 217 | 217 | +0.00 | 217 | +0.00 | 216 | -0.46 | 217 | > +0.00 | > | 429.mcf | 224 | 218 | -2.68 | 223 | -0.45 | 221 | -1.34 | 226 | > +0.89 | > | 445.gobmk | 361 | 361 | +0.00 | 361 | +0.00 | 360 | -0.28 | 363 | > +0.55 | > | 456.hmmer | 296 | 296 | +0.00 | 296 | +0.00 | 297 | +0.34 | 296 | > +0.00 | > | 458.sjeng | 453 | 452 | -0.22 | 454 | +0.22 | 454 | +0.22 | 460 | > +1.55 | > | 462.libquantum | 289 | 289 | +0.00 | 291 | +0.69 | 289 | +0.00 | 291 | > +0.69 | > | 464.h264ref | 391 | 391 | +0.00 | 385 | -1.53 | 385 | -1.53 | 385 | > -1.53 | > | 471.omnetpp | 269 | 255 | -5.20 | 250 | -7.06 | 247 | -8.18 | 268 | > -0.37 | > | 473.astar | 320 | 321 | +0.31 | 317 | -0.94 | 320 | +0.00 | 320 | > +0.00 | > | 483.xalancbmk | 187 | 188 | +0.53 | 188 | +0.53 | 187 | +0.00 | 187 | > +0.00 | > > Although the omnetpp looks like a sizeable improvement I should warn > that this is one of the few slightly jumpy benchmarks. However, I > re-run it a few more times and it seems like it is jumping around a > lower value when compiled with the patched compiler. It might not be > the 5-8% though. > > Text size > --------- > > | Benchmark | trunk | struict | % | x1 | % | x2 | > % | x4 | % | > |----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| > | 400.perlbench | 875874 | 875954 | +0.01 | 875954 | +0.01 | 876018 | > +0.02 | 876146 | +0.03 | > | 401.bzip2 | 44754 | 44754 | +0.00 | 44754 | +0.00 | 44754 | > +0.00 | 44754 | +0.00 | > | 403.gcc | 2294466 | 2294930 | +0.02 | 2296098 | +0.07 | 2296306 | > +0.08 | 2296466 | +0.09 | > | 429.mcf | 8226 | 8226 | +0.00 | 8226 | +0.00 | 8258 | > +0.39 | 8258 | +0.39 | > | 445.gobmk | 579778 | 579778 | +0.00 | 579826 | +0.01 | 579826 | > +0.01 | 580402 | +0.11 | > | 456.hmmer | 221058 | 221058 | +0.00 | 221058 | +0.00 | 221058 | > +0.00 | 221058 | +0.00 | > | 458.sjeng | 93362 | 93362 | +0.00 | 94882 | +1.63 | 94882 | > +1.63 | 96066 | +2.90 | > | 462.libquantum | 28314 | 28314 | +0.00 | 28362 | +0.17 | 28362 | > +0.17 | 28362 | +0.17 | > | 464.h264ref | 393874 | 393874 | +0.00 | 393922 | +0.01 | 393922 | > +0.01 | 394226 | +0.09 | > | 471.omnetpp | 430306 | 430306 | +0.00 | 430418 | +0.03 | 430418 | > +0.03 | 430418 | +0.03 | > | 473.astar | 29362 | 29538 | +0.60 | 29538 | +0.60 | 29554 | > +0.65 | 29554 | +0.65 | > | 483.xalancbmk | 2361298 | 2361506 | +0.01 | 2361506 | +0.01 | 2361506 | > +0.01 | 2361506 | +0.01 | > > > > Zen SPECINT 2006 -Ofast native tuning > ====================================== > > Run-time > -------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 400.perlbench | 240 | 239 | -0.42 | 239 | -0.42 | 241 | +0.42 | 238 | > -0.83 | > | 401.bzip2 | 341 | 341 | +0.00 | 341 | +0.00 | 341 | +0.00 | 340 | > -0.29 | > | 403.gcc | 210 | 208 | -0.95 | 207 | -1.43 | 209 | -0.48 | 208 | > -0.95 | > | 429.mcf | 225 | 225 | +0.00 | 225 | +0.00 | 228 | +1.33 | 226 | > +0.44 | > | 445.gobmk | 352 | 352 | +0.00 | 352 | +0.00 | 351 | -0.28 | 352 | > +0.00 | > | 456.hmmer | 131 | 131 | +0.00 | 131 | +0.00 | 131 | +0.00 | 131 | > +0.00 | > | 458.sjeng | 442 | 442 | +0.00 | 438 | -0.90 | 438 | -0.90 | 437 | > -1.13 | > | 462.libquantum | 291 | 292 | +0.34 | 286 | -1.72 | 287 | -1.37 | 287 | > -1.37 | > | 464.h264ref | 364 | 365 | +0.27 | 364 | +0.00 | 364 | +0.00 | 363 | > -0.27 | > | 471.omnetpp | 266 | 266 | +0.00 | 265 | -0.38 | 265 | -0.38 | 265 | > -0.38 | > | 473.astar | 306 | 307 | +0.33 | 306 | +0.00 | 306 | +0.00 | 309 | > +0.98 | > | 483.xalancbmk | 177 | 173 | -2.26 | 170 | -3.95 | 170 | -3.95 | 170 | > -3.95 | > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 | > % | x4 | % | > |----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| > | 400.perlbench | 1161762 | 1161874 | +0.01 | 1161874 | +0.01 | 1162226 | > +0.04 | 1162338 | +0.05 | > | 401.bzip2 | 80834 | 80834 | +0.00 | 80834 | +0.00 | 80834 | > +0.00 | 80834 | +0.00 | > | 403.gcc | 3170946 | 3171394 | +0.01 | 3172914 | +0.06 | 3173170 | > +0.07 | 3174818 | +0.12 | > | 429.mcf | 10418 | 10418 | +0.00 | 10418 | +0.00 | 10450 | > +0.31 | 10450 | +0.31 | > | 445.gobmk | 779778 | 779778 | +0.00 | 779842 | +0.01 | 779842 | > +0.01 | 780418 | +0.08 | > | 456.hmmer | 328258 | 328258 | +0.00 | 328258 | +0.00 | 328258 | > +0.00 | 328258 | +0.00 | > | 458.sjeng | 146386 | 146386 | +0.00 | 148162 | +1.21 | 148162 | > +1.21 | 149330 | +2.01 | > | 462.libquantum | 30666 | 30666 | +0.00 | 30730 | +0.21 | 30730 | > +0.21 | 30730 | +0.21 | > | 464.h264ref | 737826 | 737826 | +0.00 | 737890 | +0.01 | 737890 | > +0.01 | 739186 | +0.18 | > | 471.omnetpp | 561570 | 561570 | +0.00 | 561826 | +0.05 | 561826 | > +0.05 | 561826 | +0.05 | > | 473.astar | 39314 | 39522 | +0.53 | 39522 | +0.53 | 39538 | > +0.57 | 39538 | +0.57 | > | 483.xalancbmk | 3319682 | 3319842 | +0.00 | 3319842 | +0.00 | 3319842 | > +0.00 | 3319842 | +0.00 | > > > > Zen SPECFP 2006 -O2 generic tuning > ================================== > > Run-time > -------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |---------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 410.bwaves | 214 | 213 | -0.47 | 214 | +0.00 | 214 | +0.00 | 214 | > +0.00 | > | 433.milc | 290 | 291 | +0.34 | 290 | +0.00 | 295 | +1.72 | 289 | > -0.34 | > | 434.zeusmp | 182 | 182 | +0.00 | 182 | +0.00 | 184 | +1.10 | 182 | > +0.00 | > | 435.gromacs | 218 | 218 | +0.00 | 217 | -0.46 | 216 | -0.92 | 220 | > +0.92 | > | 436.cactusADM | 350 | 349 | -0.29 | 349 | -0.29 | 343 | -2.00 | 349 | > -0.29 | > | 437.leslie3d | 196 | 195 | -0.51 | 196 | +0.00 | 194 | -1.02 | 196 | > +0.00 | > | 444.namd | 273 | 273 | +0.00 | 273 | +0.00 | 273 | +0.00 | 273 | > +0.00 | > | 447.dealII | 211 | 211 | +0.00 | 210 | -0.47 | 210 | -0.47 | 211 | > +0.00 | > | 450.soplex | 187 | 188 | +0.53 | 188 | +0.53 | 187 | +0.00 | 187 | > +0.00 | > | 453.povray | 119 | 118 | -0.84 | 119 | +0.00 | 119 | +0.00 | 118 | > -0.84 | > | 454.calculix | 534 | 533 | -0.19 | 531 | -0.56 | 531 | -0.56 | 532 | > -0.37 | > | 459.GemsFDTD | 236 | 235 | -0.42 | 235 | -0.42 | 242 | +2.54 | 237 | > +0.42 | > | 465.tonto | 366 | 365 | -0.27 | 365 | -0.27 | 364 | -0.55 | 365 | > -0.27 | > | 470.lbm | 181 | 180 | -0.55 | 180 | -0.55 | 180 | -0.55 | 180 | > -0.55 | > | 481.wrf | 303 | 303 | +0.00 | 302 | -0.33 | 304 | +0.33 | 304 | > +0.33 | > | 482.sphinx3 | 362 | 362 | +0.00 | 360 | -0.55 | 361 | -0.28 | 363 | > +0.28 | > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 | > % | x4 | % | > |---------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| > | 410.bwaves | 25954 | 25954 | +0.00 | 25954 | +0.00 | 25954 | > +0.00 | 25954 | +0.00 | > | 433.milc | 87922 | 87922 | +0.00 | 87922 | +0.00 | 88610 | > +0.78 | 89042 | +1.27 | > | 434.zeusmp | 212034 | 212034 | +0.00 | 212034 | +0.00 | 212034 | > +0.00 | 212034 | +0.00 | > | 435.gromacs | 747026 | 747026 | +0.00 | 747026 | +0.00 | 747026 | > +0.00 | 747026 | +0.00 | > | 436.cactusADM | 526178 | 526178 | +0.00 | 526178 | +0.00 | 526274 | > +0.02 | 526274 | +0.02 | > | 437.leslie3d | 83234 | 83234 | +0.00 | 83234 | +0.00 | 83234 | > +0.00 | 83234 | +0.00 | > | 444.namd | 297234 | 297266 | +0.01 | 297266 | +0.01 | 297266 | > +0.01 | 297266 | +0.01 | > | 447.dealII | 2165282 | 2167650 | +0.11 | 2172290 | +0.32 | 2174034 | > +0.40 | 2174082 | +0.41 | > | 450.soplex | 347122 | 347122 | +0.00 | 347122 | +0.00 | 347122 | > +0.00 | 347122 | +0.00 | > | 453.povray | 800914 | 800962 | +0.01 | 801570 | +0.08 | 802002 | > +0.14 | 803138 | +0.28 | > | 454.calculix | 1342802 | 1342802 | +0.00 | 1342802 | +0.00 | 1342802 | > +0.00 | 1342802 | +0.00 | > | 459.GemsFDTD | 353410 | 354050 | +0.18 | 354050 | +0.18 | 354050 | > +0.18 | 354098 | +0.19 | > | 465.tonto | 3464210 | 3465058 | +0.02 | 3465058 | +0.02 | 3468434 | > +0.12 | 3476594 | +0.36 | > | 470.lbm | 9202 | 9202 | +0.00 | 9202 | +0.00 | 9202 | > +0.00 | 9202 | +0.00 | > | 481.wrf | 3345170 | 3345170 | +0.00 | 3345170 | +0.00 | 3351586 | > +0.19 | 3351586 | +0.19 | > | 482.sphinx3 | 125026 | 125026 | +0.00 | 125026 | +0.00 | 125026 | > +0.00 | 125026 | +0.00 | > > > > Zen SPECFP 2006 -Ofast native tuning > ==================================== > > Run-time > -------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |---------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 410.bwaves | 151 | 150 | -0.66 | 151 | +0.00 | 151 | +0.00 | 151 | > +0.00 | > | 433.milc | 197 | 197 | +0.00 | 197 | +0.00 | 194 | -1.52 | 186 | > -5.58 | > | 434.zeusmp | 128 | 128 | +0.00 | 128 | +0.00 | 128 | +0.00 | 128 | > +0.00 | > | 435.gromacs | 181 | 181 | +0.00 | 180 | -0.55 | 180 | -0.55 | 181 | > +0.00 | > | 436.cactusADM | 139 | 139 | +0.00 | 139 | +0.00 | 132 | -5.04 | 139 | > +0.00 | > | 437.leslie3d | 159 | 160 | +0.63 | 160 | +0.63 | 159 | +0.00 | 159 | > +0.00 | > | 444.namd | 256 | 256 | +0.00 | 255 | -0.39 | 255 | -0.39 | 256 | > +0.00 | > | 447.dealII | 200 | 200 | +0.00 | 199 | -0.50 | 201 | +0.50 | 201 | > +0.50 | > | 450.soplex | 184 | 184 | +0.00 | 185 | +0.54 | 184 | +0.00 | 184 | > +0.00 | > | 453.povray | 124 | 122 | -1.61 | 123 | -0.81 | 124 | +0.00 | 122 | > -1.61 | > | 454.calculix | 192 | 192 | +0.00 | 192 | +0.00 | 193 | +0.52 | 193 | > +0.52 | > | 459.GemsFDTD | 208 | 208 | +0.00 | 208 | +0.00 | 214 | +2.88 | 208 | > +0.00 | > | 465.tonto | 320 | 320 | +0.00 | 320 | +0.00 | 320 | +0.00 | 320 | > +0.00 | > | 470.lbm | 142 | 142 | +0.00 | 142 | +0.00 | 142 | +0.00 | 142 | > +0.00 | > | 481.wrf | 195 | 195 | +0.00 | 195 | +0.00 | 195 | +0.00 | 195 | > +0.00 | > | 482.sphinx3 | 256 | 258 | +0.78 | 256 | +0.00 | 256 | +0.00 | 257 | > +0.39 | > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 | > % | x4 | % | > |---------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| > | 410.bwaves | 27490 | 27490 | +0.00 | 27490 | +0.00 | 27490 | > +0.00 | 27490 | +0.00 | > | 433.milc | 118178 | 118178 | +0.00 | 118178 | +0.00 | 118962 | > +0.66 | 119634 | +1.23 | > | 434.zeusmp | 411106 | 411106 | +0.00 | 411106 | +0.00 | 411106 | > +0.00 | 411106 | +0.00 | > | 435.gromacs | 935970 | 935970 | +0.00 | 935970 | +0.00 | 935970 | > +0.00 | 936162 | +0.02 | > | 436.cactusADM | 750546 | 750546 | +0.00 | 750546 | +0.00 | 750626 | > +0.01 | 750626 | +0.01 | > | 437.leslie3d | 123410 | 123410 | +0.00 | 123410 | +0.00 | 123410 | > +0.00 | 123410 | +0.00 | > | 444.namd | 284082 | 284114 | +0.01 | 284114 | +0.01 | 284114 | > +0.01 | 284114 | +0.01 | > | 447.dealII | 2438610 | 2440946 | +0.10 | 2444978 | +0.26 | 2446882 | > +0.34 | 2446930 | +0.34 | > | 450.soplex | 443218 | 443218 | +0.00 | 443218 | +0.00 | 443218 | > +0.00 | 443218 | +0.00 | > | 453.povray | 1077778 | 1077890 | +0.01 | 1078658 | +0.08 | 1079026 | > +0.12 | 1080370 | +0.24 | > | 454.calculix | 1639138 | 1639138 | +0.00 | 1639138 | +0.00 | 1639474 | > +0.02 | 1639474 | +0.02 | > | 459.GemsFDTD | 451202 | 451234 | +0.01 | 451234 | +0.01 | 451234 | > +0.01 | 451282 | +0.02 | > | 465.tonto | 4584690 | 4585250 | +0.01 | 4585250 | +0.01 | 4588130 | > +0.08 | 4595442 | +0.23 | > | 470.lbm | 9858 | 9858 | +0.00 | 9858 | +0.00 | 9858 | > +0.00 | 9858 | +0.00 | > | 481.wrf | 4588002 | 4588002 | +0.00 | 4588290 | +0.01 | 4621010 | > +0.72 | 4621922 | +0.74 | > | 482.sphinx3 | 179602 | 179602 | +0.00 | 179602 | +0.00 | 179602 | > +0.00 | 179602 | +0.00 | > > > > Zen SPEC INT 2017 -O2 generic tuning > ==================================== > > Run-time > -------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |-----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 500.perlbench_r | 529 | 529 | +0.00 | 531 | +0.38 | 530 | +0.19 | 534 | > +0.95 | > | 502.gcc_r | 338 | 333 | -1.48 | 334 | -1.18 | 339 | +0.30 | 339 | > +0.30 | > | 505.mcf_r | 382 | 381 | -0.26 | 382 | +0.00 | 382 | +0.00 | 381 | > -0.26 | > | 520.omnetpp_r | 511 | 503 | -1.57 | 497 | -2.74 | 497 | -2.74 | 497 | > -2.74 | > | 523.xalancbmk_r | 391 | 388 | -0.77 | 389 | -0.51 | 390 | -0.26 | 391 | > +0.00 | > | 525.x264_r | 590 | 590 | +0.00 | 591 | +0.17 | 592 | +0.34 | 593 | > +0.51 | > | 531.deepsjeng_r | 427 | 427 | +0.00 | 427 | +0.00 | 428 | +0.23 | 427 | > +0.00 | > | 541.leela_r | 716 | 716 | +0.00 | 716 | +0.00 | 719 | +0.42 | 719 | > +0.42 | > | 548.exchange2_r | 593 | 593 | +0.00 | 593 | +0.00 | 593 | +0.00 | 593 | > +0.00 | > | 557.xz_r | 452 | 452 | +0.00 | 453 | +0.22 | 454 | +0.44 | 452 | > +0.00 | > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 | > % | x4 | % | > |-----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| > | 500.perlbench_r | 1599442 | 1599522 | +0.01 | 1599522 | +0.01 | 1599522 | > +0.01 | 1600082 | +0.04 | > | 502.gcc_r | 6757602 | 6758978 | +0.02 | 6759090 | +0.02 | 6759842 | > +0.03 | 6760306 | +0.04 | > | 505.mcf_r | 16098 | 16098 | +0.00 | 16098 | +0.00 | 16098 | > +0.00 | 16306 | +1.29 | > | 520.omnetpp_r | 1262498 | 1262562 | +0.01 | 1264034 | +0.12 | 1264034 | > +0.12 | 1264034 | +0.12 | > | 523.xalancbmk_r | 3989026 | 3989202 | +0.00 | 3989202 | +0.00 | 3989202 | > +0.00 | 3989202 | +0.00 | > | 525.x264_r | 414130 | 414194 | +0.02 | 414194 | +0.02 | 414738 | > +0.15 | 415122 | +0.24 | > | 531.deepsjeng_r | 67426 | 67426 | +0.00 | 67458 | +0.05 | 67458 | > +0.05 | 67458 | +0.05 | > | 541.leela_r | 219378 | 219378 | +0.00 | 219378 | +0.00 | 224082 | > +2.14 | 237026 | +8.04 | > | 548.exchange2_r | 61234 | 61234 | +0.00 | 61234 | +0.00 | 61234 | > +0.00 | 61234 | +0.00 | > | 557.xz_r | 111490 | 111490 | +0.00 | 111490 | +0.00 | 111506 | > +0.01 | 111890 | +0.36 | > > > > Zen SPEC INT 2017 -Ofast native tuning > ====================================== > > Run-time > --------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |-----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 500.perlbench_r | 525 | 524 | -0.19 | 525 | +0.00 | 525 | +0.00 | 534 | > +1.71 | > | 502.gcc_r | 331 | 329 | -0.60 | 324 | -2.11 | 330 | -0.30 | 324 | > -2.11 | > | 505.mcf_r | 380 | 380 | +0.00 | 381 | +0.26 | 380 | +0.00 | 379 | > -0.26 | > | 520.omnetpp_r | 487 | 486 | -0.21 | 488 | +0.21 | 489 | +0.41 | 488 | > +0.21 | > | 523.xalancbmk_r | 373 | 369 | -1.07 | 367 | -1.61 | 370 | -0.80 | 368 | > -1.34 | > | 525.x264_r | 319 | 319 | +0.00 | 320 | +0.31 | 321 | +0.63 | 322 | > +0.94 | > | 531.deepsjeng_r | 418 | 418 | +0.00 | 418 | +0.00 | 418 | +0.00 | 419 | > +0.24 | > | 541.leela_r | 674 | 674 | +0.00 | 674 | +0.00 | 672 | -0.30 | 672 | > -0.30 | > | 548.exchange2_r | 466 | 466 | +0.00 | 466 | +0.00 | 466 | +0.00 | 466 | > +0.00 | > | 557.xz_r | 443 | 443 | +0.00 | 443 | +0.00 | 449 | +1.35 | 449 | > +1.35 | > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 | > % | x4 | % | > |-----------------+---------+---------+-------+---------+-------+---------+-------+---------+-------| > | 500.perlbench_r | 2122882 | 2122962 | +0.00 | 2122962 | +0.00 | 2122962 | > +0.00 | 2122514 | -0.02 | > | 502.gcc_r | 8566290 | 8567794 | +0.02 | 8569138 | +0.03 | 8570066 | > +0.04 | 8570642 | +0.05 | > | 505.mcf_r | 26770 | 26770 | +0.00 | 26770 | +0.00 | 26770 | > +0.00 | 26962 | +0.72 | > | 520.omnetpp_r | 1713938 | 1713954 | +0.00 | 1714754 | +0.05 | 1714754 | > +0.05 | 1714754 | +0.05 | > | 523.xalancbmk_r | 4881890 | 4882114 | +0.00 | 4882114 | +0.00 | 4882114 | > +0.00 | 4882114 | +0.00 | > | 525.x264_r | 601522 | 601602 | +0.01 | 601602 | +0.01 | 602130 | > +0.10 | 602834 | +0.22 | > | 531.deepsjeng_r | 90306 | 90306 | +0.00 | 90338 | +0.04 | 90338 | > +0.04 | 90338 | +0.04 | > | 541.leela_r | 277634 | 277650 | +0.01 | 277650 | +0.01 | 282386 | > +1.71 | 295778 | +6.54 | > | 548.exchange2_r | 109058 | 109058 | +0.00 | 109058 | +0.00 | 109058 | > +0.00 | 109058 | +0.00 | > | 557.xz_r | 154594 | 154594 | +0.00 | 154594 | +0.00 | 154610 | > +0.01 | 154930 | +0.22 | > > > > Zen SPEC 2017 FP -O2 generic tuning > =================================== > > Run-time > -------- > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 > | % | > |-----------------+-------+-----+--------+-----+--------+-----+--------+-----+--------| > | 503.bwaves_r | 801 | 801 | +0.00 | 801 | +0.00 | 801 | +0.00 | 801 > | +0.00 | > | 507.cactuBSSN_r | 303 | 302 | -0.33 | 299 | -1.32 | 302 | -0.33 | 307 > | +1.32 | > | 508.namd_r | 306 | 306 | +0.00 | 307 | +0.33 | 306 | +0.00 | 306 > | +0.00 | > | 510.parest_r | 558 | 553 | -0.90 | 561 | +0.54 | 554 | -0.72 | 562 > | +0.72 | > | 511.povray_r | 679 | 672 | -1.03 | 673 | -0.88 | 680 | +0.15 | 644 > | -5.15 | > | 519.lbm_r | 240 | 240 | +0.00 | 240 | +0.00 | 240 | +0.00 | 240 > | +0.00 | > | 521.wrf_r | 851 | 827 | -2.82 | 827 | -2.82 | 827 | -2.82 | 828 > | -2.70 | > | 526.blender_r | 376 | 376 | +0.00 | 379 | +0.80 | 377 | +0.27 | 376 > | +0.00 | > | 527.cam4_r | 529 | 527 | -0.38 | 533 | +0.76 | 536 | +1.32 | 528 > | -0.19 | > | 538.imagick_r | 646 | 570 | -11.76 | 570 | -11.76 | 569 | -11.92 | 570 > | -11.76 | > | 544.nab_r | 467 | 467 | +0.00 | 467 | +0.00 | 467 | +0.00 | 467 > | +0.00 | > | 549.fotonik3d_r | 413 | 413 | +0.00 | 414 | +0.24 | 415 | +0.48 | 413 > | +0.00 | > | 554.roms_r | 459 | 455 | -0.87 | 456 | -0.65 | 456 | -0.65 | 456 > | -0.65 | > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 > | % | x4 | % | > |-----------------+----------+----------+-------+----------+-------+----------+-------+----------+-------| > | 503.bwaves_r | 32034 | 32034 | +0.00 | 32034 | +0.00 | 32034 > | +0.00 | 32034 | +0.00 | > | 507.cactuBSSN_r | 2951634 | 2951634 | +0.00 | 2951634 | +0.00 | 2951698 > | +0.00 | 2951730 | +0.00 | > | 508.namd_r | 837458 | 837490 | +0.00 | 837490 | +0.00 | 837490 > | +0.00 | 837490 | +0.00 | > | 510.parest_r | 6540866 | 6545618 | +0.07 | 6546754 | +0.09 | 6561426 > | +0.31 | 6569426 | +0.44 | > | 511.povray_r | 803618 | 803666 | +0.01 | 804274 | +0.08 | 804706 > | +0.14 | 805842 | +0.28 | > | 519.lbm_r | 12018 | 12018 | +0.00 | 12018 | +0.00 | 12018 > | +0.00 | 12018 | +0.00 | > | 521.wrf_r | 16292962 | 16296786 | +0.02 | 16296978 | +0.02 | 16302594 > | +0.06 | 16419842 | +0.78 | > | 526.blender_r | 7268224 | 7281264 | +0.18 | 7282608 | +0.20 | 7289168 > | +0.29 | 7295296 | +0.37 | > | 527.cam4_r | 5063666 | 5063922 | +0.01 | 5065010 | +0.03 | 5068114 > | +0.09 | 5072946 | +0.18 | > | 538.imagick_r | 1608178 | 1609282 | +0.07 | 1609282 | +0.07 | 1613458 > | +0.33 | 1613970 | +0.36 | > | 544.nab_r | 156242 | 156242 | +0.00 | 156242 | +0.00 | 156242 > | +0.00 | 156242 | +0.00 | > | 549.fotonik3d_r | 326738 | 326738 | +0.00 | 326738 | +0.00 | 326738 > | +0.00 | 326738 | +0.00 | > | 554.roms_r | 728546 | 728546 | +0.00 | 728546 | +0.00 | 728546 > | +0.00 | 728546 | +0.00 | > > > > Zen SPEC 2017 FP -Ofast native tuning > ===================================== > > Run-time > -------- > > | Benchmark | trunk | s | % | x1 | % | x2 | % | x4 | > % | > |-----------------+-------+-----+-------+-----+-------+-----+-------+-----+-------| > | 503.bwaves_r | 310 | 310 | +0.00 | 310 | +0.00 | 310 | +0.00 | 309 | > -0.32 | > | 507.cactuBSSN_r | 269 | 266 | -1.12 | 266 | -1.12 | 268 | -0.37 | 270 | > +0.37 | > | 508.namd_r | 270 | 269 | -0.37 | 269 | -0.37 | 268 | -0.74 | 268 | > -0.74 | > | 510.parest_r | 607 | 601 | -0.99 | 599 | -1.32 | 599 | -1.32 | 604 | > -0.49 | > | 511.povray_r | 662 | 664 | +0.30 | 671 | +1.36 | 680 | +2.72 | 675 | > +1.96 | > | 519.lbm_r | 186 | 186 | +0.00 | 186 | +0.00 | 186 | +0.00 | 186 | > +0.00 | > | 521.wrf_r | 550 | 554 | +0.73 | 550 | +0.00 | 550 | +0.00 | 549 | > -0.18 | > | 526.blender_r | 355 | 354 | -0.28 | 355 | +0.00 | 354 | -0.28 | 354 | > -0.28 | > | 527.cam4_r | 434 | 437 | +0.69 | 435 | +0.23 | 437 | +0.69 | 435 | > +0.23 | > | 538.imagick_r | 433 | 420 | -3.00 | 420 | -3.00 | 420 | -3.00 | 419 | > -3.23 | > | 544.nab_r | 424 | 425 | +0.24 | 425 | +0.24 | 425 | +0.24 | 425 | > +0.24 | > | 549.fotonik3d_r | 421 | 422 | +0.24 | 422 | +0.24 | 422 | +0.24 | 422 | > +0.24 | > | 554.roms_r | 360 | 361 | +0.28 | 361 | +0.28 | 361 | +0.28 | 361 | > +0.28 | > > +1.36% for 511.povray_r is the worst regression for the proposed x1 > defaults, by the way. I have not investigated it further, however. > > Text size > --------- > > | Benchmark | trunk | strict | % | x1 | % | x2 > | % | x4 | % | > |-----------------+----------+----------+-------+----------+-------+----------+-------+----------+-------| > | 503.bwaves_r | 34562 | 34562 | +0.00 | 34562 | +0.00 | 34562 > | +0.00 | 34562 | +0.00 | > | 507.cactuBSSN_r | 3978402 | 3978402 | +0.00 | 3978402 | +0.00 | 3978514 > | +0.00 | 3978546 | +0.00 | > | 508.namd_r | 869106 | 869154 | +0.01 | 869154 | +0.01 | 869154 > | +0.01 | 869154 | +0.01 | > | 510.parest_r | 7186258 | 7189298 | +0.04 | 7190370 | +0.06 | 7203890 > | +0.25 | 7211202 | +0.35 | > | 511.povray_r | 1063314 | 1063410 | +0.01 | 1064178 | +0.08 | 1064546 > | +0.12 | 1065890 | +0.24 | > | 519.lbm_r | 12178 | 12178 | +0.00 | 12178 | +0.00 | 12178 > | +0.00 | 12178 | +0.00 | > | 521.wrf_r | 19480946 | 19484146 | +0.02 | 19484466 | +0.02 | 19607538 > | +0.65 | 19716178 | +1.21 | > | 526.blender_r | 9708752 | 9719952 | +0.12 | 9722768 | +0.14 | 9730224 > | +0.22 | 9737760 | +0.30 | > | 527.cam4_r | 6217970 | 6218162 | +0.00 | 6219570 | +0.03 | 6223362 > | +0.09 | 6227762 | +0.16 | > | 538.imagick_r | 2255682 | 2256162 | +0.02 | 2256162 | +0.02 | 2261346 > | +0.25 | 2261938 | +0.28 | > | 544.nab_r | 212418 | 212418 | +0.00 | 212418 | +0.00 | 212418 > | +0.00 | 212578 | +0.08 | > | 549.fotonik3d_r | 454738 | 454738 | +0.00 | 454738 | +0.00 | 454738 > | +0.00 | 454738 | +0.00 | > | 554.roms_r | 910978 | 910978 | +0.00 | 910978 | +0.00 | 910978 > | +0.00 | 910978 | +0.00 | > > > I believe the numbers are good and thus I would like to ask-for > re-consideration of the objection and for approval to commit the patch > below. Needless to say, it has passed bootstrap and testing on > x86_64-linux. > > Thanks > > Martin > > > 2017-10-27 Martin Jambor <mjam...@suse.cz> > > PR target/80689 > * tree-sra.h: New file. > * ipa-prop.h: Moved declaration of build_ref_for_offset to > tree-sra.h. > * expr.c: Include params.h and tree-sra.h. > (emit_move_elementwise): New function. > (store_expr_with_bounds): Optionally use it. > * ipa-cp.c: Include tree-sra.h. > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New. > (PARAM_MAX_INSNS_FOR_ELEMENTWISE_COPY): Likewise. > * config/i386/i386.c (ix86_option_override_internal): Set > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35. > * tree-sra.c: Include tree-sra.h. > (scalarizable_type_p): Renamed to > simple_mix_of_records_and_arrays_p, made public, renamed the > second parameter to allow_char_arrays, added count_p parameter. > (extract_min_max_idx_from_array): New function. > (completely_scalarize): Moved bits of the function to > extract_min_max_idx_from_array. > > testsuite/ > * gcc.target/i386/pr80689-1.c: New test. > > Added insns count param limit > --- > gcc/config/i386/i386.c | 4 + > gcc/expr.c | 106 ++++++++++++++++++++++- > gcc/ipa-cp.c | 1 + > gcc/ipa-prop.h | 4 - > gcc/params.def | 12 +++ > gcc/testsuite/gcc.target/i386/pr80689-1.c | 38 +++++++++ > gcc/tree-sra.c | 134 > +++++++++++++++++++++--------- > gcc/tree-sra.h | 34 ++++++++ > 8 files changed, 288 insertions(+), 45 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c > create mode 100644 gcc/tree-sra.h > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c > index 80c8ce7ecb9..0bff2da72dd 100644 > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -4580,6 +4580,10 @@ ix86_option_override_internal (bool main_args_p, > ix86_tune_cost->l2_cache_size, > opts->x_param_values, > opts_set->x_param_values); > + maybe_set_param_value (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, > + 35, > + opts->x_param_values, > + opts_set->x_param_values); > > /* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */ > if (opts->x_flag_prefetch_loop_arrays < 0 > diff --git a/gcc/expr.c b/gcc/expr.c > index 496d492c9fa..971880b635d 100644 > --- a/gcc/expr.c > +++ b/gcc/expr.c > @@ -61,7 +61,8 @@ along with GCC; see the file COPYING3. If not see > #include "tree-chkp.h" > #include "rtl-chkp.h" > #include "ccmp.h" > - > +#include "params.h" > +#include "tree-sra.h" > > /* If this is nonzero, we do not bother generating VOLATILE > around volatile memory references, and we are willing to > @@ -5340,6 +5341,80 @@ emit_storent_insn (rtx to, rtx from) > return maybe_expand_insn (code, 2, ops); > } > > +/* Generate code for copying data of type TYPE at SOURCE plus OFFSET to > TARGET > + plus OFFSET, but do so element-wise and/or field-wise for each record and > + array within TYPE. TYPE must either be a register type or an aggregate > + complying with scalarizable_type_p. > + > + If CALL_PARAM_P is nonzero, this is a store into a call param on the > + stack, and block moves may need to be treated specially. */ > + > +static void > +emit_move_elementwise (tree type, rtx target, rtx source, HOST_WIDE_INT > offset, > + int call_param_p) > +{ > + switch (TREE_CODE (type)) > + { > + case RECORD_TYPE: > + for (tree fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld)) > + if (TREE_CODE (fld) == FIELD_DECL) > + { > + HOST_WIDE_INT fld_offset = offset + int_bit_position (fld); > + tree ft = TREE_TYPE (fld); > + emit_move_elementwise (ft, target, source, fld_offset, > + call_param_p); > + } > + break; > + > + case ARRAY_TYPE: > + { > + tree elem_type = TREE_TYPE (type); > + HOST_WIDE_INT el_size = tree_to_shwi (TYPE_SIZE (elem_type)); > + gcc_assert (el_size > 0); > + > + offset_int idx, max; > + /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. > */ > + if (extract_min_max_idx_from_array (type, &idx, &max)) > + { > + HOST_WIDE_INT el_offset = offset; > + for (; idx <= max; ++idx) > + { > + emit_move_elementwise (elem_type, target, source, el_offset, > + call_param_p); > + el_offset += el_size; > + } > + } > + } > + break; > + default: > + machine_mode mode = TYPE_MODE (type); > + > + rtx ntgt = adjust_address (target, mode, offset / BITS_PER_UNIT); > + rtx nsrc = adjust_address (source, mode, offset / BITS_PER_UNIT); > + > + /* TODO: Figure out whether the following is actually necessary. */ > + if (target == ntgt) > + ntgt = copy_rtx (target); > + if (source == nsrc) > + nsrc = copy_rtx (source); > + > + gcc_assert (mode != VOIDmode); > + if (mode != BLKmode) > + emit_move_insn (ntgt, nsrc); > + else > + { > + /* For example vector gimple registers can end up here. */ > + rtx size = expand_expr (TYPE_SIZE_UNIT (type), NULL_RTX, > + TYPE_MODE (sizetype), EXPAND_NORMAL); > + emit_block_move (ntgt, nsrc, size, > + (call_param_p > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > + } > + break; > + } > + return; > +} > + > /* Generate code for computing expression EXP, > and storing the value into TARGET. > > @@ -5713,9 +5788,32 @@ store_expr_with_bounds (tree exp, rtx target, int > call_param_p, > emit_group_store (target, temp, TREE_TYPE (exp), > int_size_in_bytes (TREE_TYPE (exp))); > else if (GET_MODE (temp) == BLKmode) > - emit_block_move (target, temp, expr_size (exp), > - (call_param_p > - ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > + { > + /* Copying smallish BLKmode structures with emit_block_move and thus > + by-pieces can result in store-to-load stalls. So copy some > simple > + small aggregates element or field-wise. */ > + int count = 0; > + if (GET_MODE (target) == BLKmode > + && AGGREGATE_TYPE_P (TREE_TYPE (exp)) > + && !TREE_ADDRESSABLE (TREE_TYPE (exp)) > + && tree_fits_shwi_p (TYPE_SIZE (TREE_TYPE (exp))) > + && (tree_to_shwi (TYPE_SIZE (TREE_TYPE (exp))) > + <= (PARAM_VALUE (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY) > + * BITS_PER_UNIT)) > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (exp), false, > + &count) > + && (count <= PARAM_VALUE > (PARAM_MAX_INSNS_FOR_ELEMENTWISE_COPY))) > + { > + /* FIXME: Can this happen? What would it mean? */ > + gcc_assert (!reverse); > + emit_move_elementwise (TREE_TYPE (exp), target, temp, 0, > + call_param_p); > + } > + else > + emit_block_move (target, temp, expr_size (exp), > + (call_param_p > + ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); > + } > /* If we emit a nontemporal store, there is nothing else to do. */ > else if (nontemporal && emit_storent_insn (target, temp)) > ; > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index d23c1d8ba3e..30f91e70c22 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -124,6 +124,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-ssa-ccp.h" > #include "stringpool.h" > #include "attribs.h" > +#include "tree-sra.h" > > template <typename valtype> class ipcp_value; > > diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h > index fa5bed49ee0..2313cc884ed 100644 > --- a/gcc/ipa-prop.h > +++ b/gcc/ipa-prop.h > @@ -877,10 +877,6 @@ ipa_parm_adjustment *ipa_get_adjustment_candidate (tree > **, bool *, > void ipa_release_body_info (struct ipa_func_body_info *); > tree ipa_get_callee_param_type (struct cgraph_edge *e, int i); > > -/* From tree-sra.c: */ > -tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree, > - gimple_stmt_iterator *, bool); > - > /* In ipa-cp.c */ > void ipa_cp_c_finalize (void); > > diff --git a/gcc/params.def b/gcc/params.def > index 8881f4c403a..9c778f9540a 100644 > --- a/gcc/params.def > +++ b/gcc/params.def > @@ -1287,6 +1287,18 @@ DEFPARAM (PARAM_VECT_EPILOGUES_NOMASK, > "Enable loop epilogue vectorization using smaller vector size.", > 0, 0, 1) > > +DEFPARAM (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY, > + "max-size-for-elementwise-copy", > + "Maximum size in bytes of a structure or an array to by considered " > + "for copying by its individual fields or elements", > + 0, 0, 512) > + > +DEFPARAM (PARAM_MAX_INSNS_FOR_ELEMENTWISE_COPY, > + "max-insns-for-elementwise-copy", > + "Maximum number of instructions needed to consider copying " > + "a structure or an array by its individual fields or elements", > + 6, 0, 64) > + > /* > > Local variables: > diff --git a/gcc/testsuite/gcc.target/i386/pr80689-1.c > b/gcc/testsuite/gcc.target/i386/pr80689-1.c > new file mode 100644 > index 00000000000..4156d4fba45 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr80689-1.c > @@ -0,0 +1,38 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +typedef struct st1 > +{ > + long unsigned int a,b; > + long int c,d; > +}R; > + > +typedef struct st2 > +{ > + int t; > + R reg; > +}N; > + > +void Set (const R *region, N *n_info ); > + > +void test(N *n_obj ,const long unsigned int a, const long unsigned int b, > const long int c,const long int d) > +{ > + R reg; > + > + reg.a=a; > + reg.b=b; > + reg.c=c; > + reg.d=d; > + Set (®, n_obj); > + > +} > + > +void Set (const R *reg, N *n_obj ) > +{ > + n_obj->reg=(*reg); > +} > + > + > +/* { dg-final { scan-assembler-not "%(x|y|z)mm\[0-9\]+" } } */ > +/* { dg-final { scan-assembler-not "movdqu" } } */ > +/* { dg-final { scan-assembler-not "movups" } } */ > diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c > index bac593951e7..d06463ce21c 100644 > --- a/gcc/tree-sra.c > +++ b/gcc/tree-sra.c > @@ -104,6 +104,7 @@ along with GCC; see the file COPYING3. If not see > #include "ipa-fnsummary.h" > #include "ipa-utils.h" > #include "builtins.h" > +#include "tree-sra.h" > > /* Enumeration of all aggregate reductions we can do. */ > enum sra_mode { SRA_MODE_EARLY_IPA, /* early call regularization */ > @@ -952,14 +953,15 @@ create_access (tree expr, gimple *stmt, bool write) > } > > > -/* Return true iff TYPE is scalarizable - i.e. a RECORD_TYPE or fixed-length > - ARRAY_TYPE with fields that are either of gimple register types (excluding > - bit-fields) or (recursively) scalarizable types. CONST_DECL must be true > if > - we are considering a decl from constant pool. If it is false, char arrays > - will be refused. */ > +/* Return true if TYPE consists of RECORD_TYPE or fixed-length ARRAY_TYPE > with > + fields/elements that are not bit-fields and are either register types or > + recursively comply with simple_mix_of_records_and_arrays_p. Furthermore, > if > + ALLOW_CHAR_ARRAYS is false, the function will return false also if TYPE > + contains an array of elements that only have one byte. */ > > -static bool > -scalarizable_type_p (tree type, bool const_decl) > +bool > +simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays, > + int *count_p) > { > gcc_assert (!is_gimple_reg_type (type)); > if (type_contains_placeholder_p (type)) > @@ -976,8 +978,13 @@ scalarizable_type_p (tree type, bool const_decl) > if (DECL_BIT_FIELD (fld)) > return false; > > - if (!is_gimple_reg_type (ft) > - && !scalarizable_type_p (ft, const_decl)) > + if (is_gimple_reg_type (ft)) > + { > + if (count_p) > + (*count_p)++; > + } > + else if (!simple_mix_of_records_and_arrays_p (ft, allow_char_arrays, > + count_p)) > return false; > } > > @@ -986,7 +993,7 @@ scalarizable_type_p (tree type, bool const_decl) > case ARRAY_TYPE: > { > HOST_WIDE_INT min_elem_size; > - if (const_decl) > + if (allow_char_arrays) > min_elem_size = 0; > else > min_elem_size = BITS_PER_UNIT; > @@ -1007,9 +1014,45 @@ scalarizable_type_p (tree type, bool const_decl) > return false; > > tree elem = TREE_TYPE (type); > - if (!is_gimple_reg_type (elem) > - && !scalarizable_type_p (elem, const_decl)) > - return false; > + if (!count_p) > + { > + if (!is_gimple_reg_type (elem) > + && !simple_mix_of_records_and_arrays_p (elem, allow_char_arrays, > + NULL)) > + return false; > + else > + return true; > + } > + > + offset_int min, max; > + HOST_WIDE_INT ds; > + bool nonzero = extract_min_max_idx_from_array (type, &min, &max); > + > + if (nonzero && (min <= max)) > + { > + offset_int d = max - min + 1; > + if (!wi::fits_shwi_p (d)) > + return false; > + ds = d.to_shwi (); > + if (ds > INT_MAX) > + return false; > + } > + else > + ds = 0; > + > + if (is_gimple_reg_type (elem)) > + *count_p += (int) ds; > + else > + { > + int elc = 0; > + if (!simple_mix_of_records_and_arrays_p (elem, allow_char_arrays, > + &elc)) > + return false; > + ds *= elc; > + if (ds > INT_MAX) > + return false; > + *count_p += (unsigned) ds; > + } > return true; > } > default: > @@ -1017,10 +1060,38 @@ scalarizable_type_p (tree type, bool const_decl) > } > } > > -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, > tree); > +static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, bool, tree, > + tree); > + > +/* For a given array TYPE, return false if its domain does not have any > maximum > + value. Otherwise calculate MIN and MAX indices of the first and the last > + element. */ > + > +bool > +extract_min_max_idx_from_array (tree type, offset_int *min, offset_int *max) > +{ > + tree domain = TYPE_DOMAIN (type); > + tree minidx = TYPE_MIN_VALUE (domain); > + gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > + tree maxidx = TYPE_MAX_VALUE (domain); > + if (!maxidx) > + return false; > + gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > + > + /* MINIDX and MAXIDX are inclusive, and must be interpreted in > + DOMAIN (e.g. signed int, whereas min/max may be size_int). */ > + *min = wi::to_offset (minidx); > + *max = wi::to_offset (maxidx); > + if (!TYPE_UNSIGNED (domain)) > + { > + *min = wi::sext (*min, TYPE_PRECISION (domain)); > + *max = wi::sext (*max, TYPE_PRECISION (domain)); > + } > + return true; > +} > > /* Create total_scalarization accesses for all scalar fields of a member > - of type DECL_TYPE conforming to scalarizable_type_p. BASE > + of type DECL_TYPE conforming to simple_mix_of_records_and_arrays_p. BASE > must be the top-most VAR_DECL representing the variable; within that, > OFFSET locates the member and REF must be the memory reference expression > for > the member. */ > @@ -1047,27 +1118,14 @@ completely_scalarize (tree base, tree decl_type, > HOST_WIDE_INT offset, tree ref) > { > tree elemtype = TREE_TYPE (decl_type); > tree elem_size = TYPE_SIZE (elemtype); > - gcc_assert (elem_size && tree_fits_shwi_p (elem_size)); > HOST_WIDE_INT el_size = tree_to_shwi (elem_size); > gcc_assert (el_size > 0); > > - tree minidx = TYPE_MIN_VALUE (TYPE_DOMAIN (decl_type)); > - gcc_assert (TREE_CODE (minidx) == INTEGER_CST); > - tree maxidx = TYPE_MAX_VALUE (TYPE_DOMAIN (decl_type)); > + offset_int idx, max; > /* Skip (some) zero-length arrays; others have MAXIDX == MINIDX - 1. > */ > - if (maxidx) > + if (extract_min_max_idx_from_array (decl_type, &idx, &max)) > { > - gcc_assert (TREE_CODE (maxidx) == INTEGER_CST); > tree domain = TYPE_DOMAIN (decl_type); > - /* MINIDX and MAXIDX are inclusive, and must be interpreted in > - DOMAIN (e.g. signed int, whereas min/max may be size_int). */ > - offset_int idx = wi::to_offset (minidx); > - offset_int max = wi::to_offset (maxidx); > - if (!TYPE_UNSIGNED (domain)) > - { > - idx = wi::sext (idx, TYPE_PRECISION (domain)); > - max = wi::sext (max, TYPE_PRECISION (domain)); > - } > for (int el_off = offset; idx <= max; ++idx) > { > tree nref = build4 (ARRAY_REF, elemtype, > @@ -1088,10 +1146,10 @@ completely_scalarize (tree base, tree decl_type, > HOST_WIDE_INT offset, tree ref) > } > > /* Create total_scalarization accesses for a member of type TYPE, which must > - satisfy either is_gimple_reg_type or scalarizable_type_p. BASE must be > the > - top-most VAR_DECL representing the variable; within that, POS and SIZE > locate > - the member, REVERSE gives its torage order. and REF must be the reference > - expression for it. */ > + satisfy either is_gimple_reg_type or simple_mix_of_records_and_arrays_p. > + BASE must be the top-most VAR_DECL representing the variable; within that, > + POS and SIZE locate the member, REVERSE gives its torage order. and REF > must > + be the reference expression for it. */ > > static void > scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size, bool > reverse, > @@ -1111,7 +1169,8 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, > HOST_WIDE_INT size, bool reverse, > } > > /* Create a total_scalarization access for VAR as a whole. VAR must be of a > - RECORD_TYPE or ARRAY_TYPE conforming to scalarizable_type_p. */ > + RECORD_TYPE or ARRAY_TYPE conforming to > + simple_mix_of_records_and_arrays_p. */ > > static void > create_total_scalarization_access (tree var) > @@ -2803,8 +2862,9 @@ analyze_all_variable_accesses (void) > { > tree var = candidate (i); > > - if (VAR_P (var) && scalarizable_type_p (TREE_TYPE (var), > - constant_decl_p (var))) > + if (VAR_P (var) > + && simple_mix_of_records_and_arrays_p (TREE_TYPE (var), > + constant_decl_p (var), > NULL)) > { > if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var))) > <= max_scalarization_size) > diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h > new file mode 100644 > index 00000000000..2857688b21e > --- /dev/null > +++ b/gcc/tree-sra.h > @@ -0,0 +1,34 @@ > +/* tree-sra.h - Run-time parameters. > + Copyright (C) 2017 Free Software Foundation, Inc. > + > +This file is part of GCC. > + > +GCC is free software; you can redistribute it and/or modify it under > +the terms of the GNU General Public License as published by the Free > +Software Foundation; either version 3, or (at your option) any later > +version. > + > +GCC is distributed in the hope that it will be useful, but WITHOUT ANY > +WARRANTY; without even the implied warranty of MERCHANTABILITY or > +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License > +for more details. > + > +You should have received a copy of the GNU General Public License > +along with GCC; see the file COPYING3. If not see > +<http://www.gnu.org/licenses/>. */ > + > +#ifndef TREE_SRA_H > +#define TREE_SRA_H > + > + > +bool simple_mix_of_records_and_arrays_p (tree type, bool allow_char_arrays, > + int *count_pg); > +bool extract_min_max_idx_from_array (tree type, offset_int *idx, > + offset_int *max); > +tree build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset, > + bool reverse, tree exp_type, > + gimple_stmt_iterator *gsi, bool insert_after); > + > + > + > +#endif /* TREE_SRA_H */ > -- > 2.14.2 >