On Fri, 1 Sep 2023, Filip Kastl wrote:

> > That's interesting.  Your placement at
> > 
> >           NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);
> >           NEXT_PASS (pass_phiopt, true /* early_p */);
> > +         NEXT_PASS (pass_sccp);
> > 
> > and
> > 
> >        NEXT_PASS (pass_tsan);
> >        NEXT_PASS (pass_dse, true /* use DR analysis */);
> >        NEXT_PASS (pass_dce);
> > +      NEXT_PASS (pass_sccp);
> > 
> > isn't immediately after the "best" existing pass we have to
> > remove dead PHIs which is pass_cd_dce.  phiopt might leave
> > dead PHIs around and the second instance runs long after the
> > last CD-DCE.
> > 
> > So I wonder if your pass just detects unnecessary PHIs we'd have
> > removed by other means and what survives until RTL expansion is
> > what we should count?
> > 
> > Can you adjust your original early placement to right after
> > the cd-dce pass and for the late placement turn the dce pass
> > before it into cd-dce and re-do your measurements?
> 
> So I did this
> 
>           NEXT_PASS (pass_dse);    
>       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);    
>       NEXT_PASS (pass_sccp);    
>       NEXT_PASS (pass_phiopt, true /* early_p */);    
>       NEXT_PASS (pass_tail_recursion);         
> 
> and this
> 
>       NEXT_PASS (pass_dse, true /* use DR analysis */);
>       NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);    
>       NEXT_PASS (pass_sccp);    
>       /* Pass group that runs when 1) enabled, 2) there are loops
> 
> and got these results:
> 
> 500.perlbench_r
> Started with (1) 30318
> Ended with (1) 26219
> Removed PHI % (1) 13.52002110957187149600
> Started with (2) 39043
> Ended with (2) 38941
> Removed PHI % (2) .26125041620777092000
> 
> 502.gcc_r
> Started with (1) 148361
> Ended with (1) 140464
> Removed PHI % (1) 5.32282742769326170700
> Started with (2) 216209
> Ended with (2) 215367
> Removed PHI % (2) .38943799749316633500
> 
> 505.mcf_r
> Started with (1) 342
> Ended with (1) 304
> Removed PHI % (1) 11.11111111111111111200
> Started with (2) 437    
> Ended with (2) 433    
> Removed PHI % (2) .91533180778032036700    
>      
> 523.xalancbmk_r    
> Started with (1) 62995    
> Ended with (1) 58289     
> Removed PHI % (1) 7.47043416144138423700    
> Started with (2) 134026    
> Ended with (2) 133193    
> Removed PHI % (2) .62152119737961291100    
>                       
> 531.deepsjeng_r    
> Started with (1) 1402    
> Ended with (1) 1264    
> Removed PHI % (1) 9.84308131241084165500    
> Started with (2) 1928    
> Ended with (2) 1920    
> Removed PHI % (2) .41493775933609958600    
>     
> 541.leela_r    
> Started with (1) 3398    
> Ended with (1) 3060    
> Removed PHI % (1) 9.94702766333137139500    
> Started with (2) 4473    
> Ended with (2) 4453    
> Removed PHI % (2) .44712720769058797300    
> 
> 557.xz_r
> Started with (1) 47
> Ended with (1) 44
> Removed PHI % (1) 6.38297872340425532000
> Started with (2) 43
> Ended with (2) 43
> Removed PHI % (2) 0
> 
> These measurements don't differ very much from the previous. It seems to me
> that phiopt does output some redundant PHIs but the vast majority of the
> eliminated PHIs are generated in earlier passes and cd_dce isn't able to get
> rid of them.
> 
> A noteworthy information might be that most of the eliminated PHIs are 
> actually
> trivial PHIs. I consider a PHI to be trivial if it only references itself or
> one other SSA name.

Ah.  The early pass numbers are certainly intresting - can you elaborate
on the last bit?  We have for example loop-closed PHI nodes like

_1 = PHI <_2>

and there are non-trivial degenerate PHIs like

_1 = PHI <_2, _2>

those are generally removed by value-numbering (FRE, DOM and PRE) and SSA 
propagation (CCP and copyprop), they are not "dead" so CD-DCE doesn't
remove them.

But we do have passes removing these kind of PHIs.

The issue with the early pass is likely that we have

          NEXT_PASS (pass_fre, true /* may_iterate */);
^^
would elimimate these kind of PHIs

          NEXT_PASS (pass_early_vrp);
^^
rewrites into loop-closed SSA, adding many such PHIs

          NEXT_PASS (pass_merge_phi);
          NEXT_PASS (pass_dse);
          NEXT_PASS (pass_cd_dce, false /* update_address_taken_p */);

and until here there's no pass eliding the LC SSA PHIs.

You could add a pass_copy_prop after early_vrp, the later sccp
pass shouldn't run into this issue I think so it must be other
passes adding such kind of PHIs.

Maybe you can count single-argument PHIs, degenerate multi-arg PHIs
and "other" PHIs separately as you remove them?


> Here is a comparison of the newest measurements (sccp after cd_dce) with the
> previous ones (sccp after phiopt and dce):
> 
> 500.perlbench_r
>  
> Started with (1-PREV) 30287
> Started with (1-NEW) 30318
>  
> Ended with (1-PREV) 26188
> Ended with (1-NEW) 26219
>  
> Removed PHI % (1-PREV) 13.53385941162875161000
> Removed PHI % (1-NEW) 13.52002110957187149600
>  
> Started with (2-PREV) 38005
> Started with (2-NEW) 39043
>  
> Ended with (2-PREV) 37897
> Ended with (2-NEW) 38941
>  
> Removed PHI % (2-PREV) .28417313511380081600
> Removed PHI % (2-NEW) .26125041620777092000
>  
> 502.gcc_r
>  
> Started with (1-PREV) 148187
> Started with (1-NEW) 148361
>  
> Ended with (1-PREV) 140292
> Ended with (1-NEW) 140464
>  
> Removed PHI % (1-PREV) 5.32772780338356266100
> Removed PHI % (1-NEW) 5.32282742769326170700
>                       
> Started with (2-PREV) 211479
> Started with (2-NEW) 216209
>  
> Ended with (2-PREV) 210635
> Ended with (2-NEW) 215367
>  
> Removed PHI % (2-PREV) .39909399987705635100
> Removed PHI % (2-NEW) .38943799749316633500
> 
> 
> Filip K
> 
> 
> P.S. I made a small mistake and didn't compute the benchmark speedup
> percentages right in the previous email. Here are the corrected results. The
> correct percentages are a little bit smaller but very similar. There is still 
> a
> ~2% speedup with 505.mcf_r and 541.leela_r.
> 
> 500.perlbench_r
> Without SCCP: 244.151807s
> With SCCP: 242.448438s
> -0.6976679881791663%
> 
> 502.gcc_r
> Without SCCP: 211.029606s
> With SCCP: 211.614523s
> +0.27717295742853737%
> 
> 505.mcf_r
> Without SCCP: 298.782621s
> With SCCP: 291.671468s
> -2.380042378703145%
> 
> 523.xalancbmk_r
> Without SCCP: 189.940639s
> With SCCP: 189.876261s
> -0.03389374719330334%
> 
> 531.deepsjeng_r
> Without SCCP: 250.63648s
> With SCCP: 250.988624s
> +0.14049989849840747%
> 
> 541.leela_r
> Without SCCP: 346.066278s
> With SCCP: 339.692987s
> -1.8416388435281157%
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to