> That's interesting. Your placement at > > NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); > NEXT_PASS (pass_phiopt, true /* early_p */); > + NEXT_PASS (pass_sccp); > > and > > NEXT_PASS (pass_tsan); > NEXT_PASS (pass_dse, true /* use DR analysis */); > NEXT_PASS (pass_dce); > + NEXT_PASS (pass_sccp); > > isn't immediately after the "best" existing pass we have to > remove dead PHIs which is pass_cd_dce. phiopt might leave > dead PHIs around and the second instance runs long after the > last CD-DCE. > > So I wonder if your pass just detects unnecessary PHIs we'd have > removed by other means and what survives until RTL expansion is > what we should count? > > Can you adjust your original early placement to right after > the cd-dce pass and for the late placement turn the dce pass > before it into cd-dce and re-do your measurements?
So I did this NEXT_PASS (pass_dse); NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); NEXT_PASS (pass_sccp); NEXT_PASS (pass_phiopt, true /* early_p */); NEXT_PASS (pass_tail_recursion); and this NEXT_PASS (pass_dse, true /* use DR analysis */); NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */); NEXT_PASS (pass_sccp); /* Pass group that runs when 1) enabled, 2) there are loops and got these results: 500.perlbench_r Started with (1) 30318 Ended with (1) 26219 Removed PHI % (1) 13.52002110957187149600 Started with (2) 39043 Ended with (2) 38941 Removed PHI % (2) .26125041620777092000 502.gcc_r Started with (1) 148361 Ended with (1) 140464 Removed PHI % (1) 5.32282742769326170700 Started with (2) 216209 Ended with (2) 215367 Removed PHI % (2) .38943799749316633500 505.mcf_r Started with (1) 342 Ended with (1) 304 Removed PHI % (1) 11.11111111111111111200 Started with (2) 437 Ended with (2) 433 Removed PHI % (2) .91533180778032036700 523.xalancbmk_r Started with (1) 62995 Ended with (1) 58289 Removed PHI % (1) 7.47043416144138423700 Started with (2) 134026 Ended with (2) 133193 Removed PHI % (2) .62152119737961291100 531.deepsjeng_r Started with (1) 1402 Ended with (1) 1264 Removed PHI % (1) 9.84308131241084165500 Started with (2) 1928 Ended with (2) 1920 Removed PHI % (2) .41493775933609958600 541.leela_r Started with (1) 3398 Ended with (1) 3060 Removed PHI % (1) 9.94702766333137139500 Started with (2) 4473 Ended with (2) 4453 Removed PHI % (2) .44712720769058797300 557.xz_r Started with (1) 47 Ended with (1) 44 Removed PHI % (1) 6.38297872340425532000 Started with (2) 43 Ended with (2) 43 Removed PHI % (2) 0 These measurements don't differ very much from the previous. It seems to me that phiopt does output some redundant PHIs but the vast majority of the eliminated PHIs are generated in earlier passes and cd_dce isn't able to get rid of them. A noteworthy information might be that most of the eliminated PHIs are actually trivial PHIs. I consider a PHI to be trivial if it only references itself or one other SSA name. Here is a comparison of the newest measurements (sccp after cd_dce) with the previous ones (sccp after phiopt and dce): 500.perlbench_r Started with (1-PREV) 30287 Started with (1-NEW) 30318 Ended with (1-PREV) 26188 Ended with (1-NEW) 26219 Removed PHI % (1-PREV) 13.53385941162875161000 Removed PHI % (1-NEW) 13.52002110957187149600 Started with (2-PREV) 38005 Started with (2-NEW) 39043 Ended with (2-PREV) 37897 Ended with (2-NEW) 38941 Removed PHI % (2-PREV) .28417313511380081600 Removed PHI % (2-NEW) .26125041620777092000 502.gcc_r Started with (1-PREV) 148187 Started with (1-NEW) 148361 Ended with (1-PREV) 140292 Ended with (1-NEW) 140464 Removed PHI % (1-PREV) 5.32772780338356266100 Removed PHI % (1-NEW) 5.32282742769326170700 Started with (2-PREV) 211479 Started with (2-NEW) 216209 Ended with (2-PREV) 210635 Ended with (2-NEW) 215367 Removed PHI % (2-PREV) .39909399987705635100 Removed PHI % (2-NEW) .38943799749316633500 Filip K P.S. I made a small mistake and didn't compute the benchmark speedup percentages right in the previous email. Here are the corrected results. The correct percentages are a little bit smaller but very similar. There is still a ~2% speedup with 505.mcf_r and 541.leela_r. 500.perlbench_r Without SCCP: 244.151807s With SCCP: 242.448438s -0.6976679881791663% 502.gcc_r Without SCCP: 211.029606s With SCCP: 211.614523s +0.27717295742853737% 505.mcf_r Without SCCP: 298.782621s With SCCP: 291.671468s -2.380042378703145% 523.xalancbmk_r Without SCCP: 189.940639s With SCCP: 189.876261s -0.03389374719330334% 531.deepsjeng_r Without SCCP: 250.63648s With SCCP: 250.988624s +0.14049989849840747% 541.leela_r Without SCCP: 346.066278s With SCCP: 339.692987s -1.8416388435281157%