On Fri, 1 Sep 2023, Filip Kastl wrote: > > That's interesting. Your placement at > > > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > > NEXT_PASS (pass_phiopt, true /* early_p */); > > + NEXT_PASS (pass_sccp); > > > > and > > > > NEXT_PASS (pass_tsan); > > NEXT_PASS (pass_dse, true /* use DR analysis */); > > NEXT_PASS (pass_dce); > > + NEXT_PASS (pass_sccp); > > > > isn't immediately after the "best" existing pass we have to > > remove dead PHIs which is pass_cd_dce. phiopt might leave > > dead PHIs around and the second instance runs long after the > > last CD-DCE. > > > > So I wonder if your pass just detects unnecessary PHIs we'd have > > removed by other means and what survives until RTL expansion is > > what we should count? > > > > Can you adjust your original early placement to right after > > the cd-dce pass and for the late placement turn the dce pass > > before it into cd-dce and re-do your measurements? > > So I did this > > NEXT_PASS (pass_dse); > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_sccp); > NEXT_PASS (pass_phiopt, true /* early_p */); > NEXT_PASS (pass_tail_recursion); > > and this > > NEXT_PASS (pass_dse, true /* use DR analysis */); > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_sccp); > /* Pass group that runs when 1) enabled, 2) there are loops > > and got these results: > > 500.perlbench_r > Started with (1) 30318 > Ended with (1) 26219 > Removed PHI % (1) 13.52002110957187149600 > Started with (2) 39043 > Ended with (2) 38941 > Removed PHI % (2) .26125041620777092000 > > 502.gcc_r > Started with (1) 148361 > Ended with (1) 140464 > Removed PHI % (1) 5.32282742769326170700 > Started with (2) 216209 > Ended with (2) 215367 > Removed PHI % (2) .38943799749316633500 > > 505.mcf_r > Started with (1) 342 > Ended with (1) 304 > Removed PHI % (1) 11.11111111111111111200 > Started with (2) 437 > Ended with (2) 433 > Removed PHI % (2) .91533180778032036700 > > 523.xalancbmk_r > Started with (1) 62995 > Ended with (1) 58289 > Removed PHI % (1) 7.47043416144138423700 > Started with (2) 134026 > Ended with (2) 133193 > Removed PHI % (2) .62152119737961291100 > > 531.deepsjeng_r > Started with (1) 1402 > Ended with (1) 1264 > Removed PHI % (1) 9.84308131241084165500 > Started with (2) 1928 > Ended with (2) 1920 > Removed PHI % (2) .41493775933609958600 > > 541.leela_r > Started with (1) 3398 > Ended with (1) 3060 > Removed PHI % (1) 9.94702766333137139500 > Started with (2) 4473 > Ended with (2) 4453 > Removed PHI % (2) .44712720769058797300 > > 557.xz_r > Started with (1) 47 > Ended with (1) 44 > Removed PHI % (1) 6.38297872340425532000 > Started with (2) 43 > Ended with (2) 43 > Removed PHI % (2) 0 > > These measurements don't differ very much from the previous. It seems to me > that phiopt does output some redundant PHIs but the vast majority of the > eliminated PHIs are generated in earlier passes and cd_dce isn't able to get > rid of them. > > A noteworthy information might be that most of the eliminated PHIs are > actually > trivial PHIs. I consider a PHI to be trivial if it only references itself or > one other SSA name.
Ah. The early pass numbers are certainly intresting - can you elaborate on the last bit? We have for example loop-closed PHI nodes like _1 = PHI <_2> and there are non-trivial degenerate PHIs like _1 = PHI <_2, _2> those are generally removed by value-numbering (FRE, DOM and PRE) and SSA propagation (CCP and copyprop), they are not "dead" so CD-DCE doesn't remove them. But we do have passes removing these kind of PHIs. The issue with the early pass is likely that we have NEXT_PASS (pass_fre, true /* may_iterate */); ^^ would elimimate these kind of PHIs NEXT_PASS (pass_early_vrp); ^^ rewrites into loop-closed SSA, adding many such PHIs NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); and until here there's no pass eliding the LC SSA PHIs. You could add a pass_copy_prop after early_vrp, the later sccp pass shouldn't run into this issue I think so it must be other passes adding such kind of PHIs. Maybe you can count single-argument PHIs, degenerate multi-arg PHIs and "other" PHIs separately as you remove them? > Here is a comparison of the newest measurements (sccp after cd_dce) with the > previous ones (sccp after phiopt and dce): > > 500.perlbench_r > > Started with (1-PREV) 30287 > Started with (1-NEW) 30318 > > Ended with (1-PREV) 26188 > Ended with (1-NEW) 26219 > > Removed PHI % (1-PREV) 13.53385941162875161000 > Removed PHI % (1-NEW) 13.52002110957187149600 > > Started with (2-PREV) 38005 > Started with (2-NEW) 39043 > > Ended with (2-PREV) 37897 > Ended with (2-NEW) 38941 > > Removed PHI % (2-PREV) .28417313511380081600 > Removed PHI % (2-NEW) .26125041620777092000 > > 502.gcc_r > > Started with (1-PREV) 148187 > Started with (1-NEW) 148361 > > Ended with (1-PREV) 140292 > Ended with (1-NEW) 140464 > > Removed PHI % (1-PREV) 5.32772780338356266100 > Removed PHI % (1-NEW) 5.32282742769326170700 > > Started with (2-PREV) 211479 > Started with (2-NEW) 216209 > > Ended with (2-PREV) 210635 > Ended with (2-NEW) 215367 > > Removed PHI % (2-PREV) .39909399987705635100 > Removed PHI % (2-NEW) .38943799749316633500 > > > Filip K > > > P.S. I made a small mistake and didn't compute the benchmark speedup > percentages right in the previous email. Here are the corrected results. The > correct percentages are a little bit smaller but very similar. There is still > a > ~2% speedup with 505.mcf_r and 541.leela_r. > > 500.perlbench_r > Without SCCP: 244.151807s > With SCCP: 242.448438s > -0.6976679881791663% > > 502.gcc_r > Without SCCP: 211.029606s > With SCCP: 211.614523s > +0.27717295742853737% > > 505.mcf_r > Without SCCP: 298.782621s > With SCCP: 291.671468s > -2.380042378703145% > > 523.xalancbmk_r > Without SCCP: 189.940639s > With SCCP: 189.876261s > -0.03389374719330334% > > 531.deepsjeng_r > Without SCCP: 250.63648s > With SCCP: 250.988624s > +0.14049989849840747% > > 541.leela_r > Without SCCP: 346.066278s > With SCCP: 339.692987s > -1.8416388435281157% > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)