> OK, so it is about 2%.  Did you try if you need lookahead even in the early 
> pass (before reload)?  My guess would be so, but if not, it could cut the 
> cost to half.  For -Ofast/-O3 it looks resonable to me, but we will  need to 
> announce it on the ML.  For other settings I think we need to work on more 
> improvements or cut the expenses.

Yes, it is required before reload.  

I have another idea which can be pondered upon. Currently, can we enable 
lookahead with the value 4 (pre reload) for default? This will exponentially 
cut the cost of build time. 
I have done some measurements on the build time of some benchmarks (mentioned 
below) with lookahead value 4. The 2% increase in build time with value 8 is 
now almost gone.

                   dfa4       no_lookahead
 
 perlbench       - 191s          193s
 bzip2           - 19s           19s
 gcc             - 429s          429s
 mcf             - 3s            3s
 gobmk           - 116s          115s
 hmmer           - 60s           60s
 sjeng           - 18s           17s
 libquantum      - 6s            6s
 h264ref         - 107s          107s
 omnetpp         - 128s          128s
 astar           - 7s            7s
 bwaves          - 5s            5s
 gamess          - 1964s         1957s
 milc            - 18s           18s
 GemsFDTD        - 273s          272s

Lookahead value 4 also helps because, the modified decoder model in bdver3.md 
is only two cycles deep (though in hardware it is actually 4 cycles deep). This 
means that we can look another two levels deep for better schedule.
GemsFDTD still retains the performance boost of around 6-7% with value 4.

Let me know your thoughts.

Regards
Ganesh

-----Original Message-----
From: Jan Hubicka [mailto:hubi...@ucw.cz] 
Sent: Thursday, October 24, 2013 6:48 PM
To: Gopalasubramanian, Ganesh
Cc: Jan Hubicka; gcc-patches@gcc.gnu.org; Uros Bizjak (ubiz...@gmail.com); H.J. 
Lu (hjl.to...@gmail.com)
Subject: Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern x86 
chips

> Hi,
> 
> > Is this with -fschedule-insns? Or only with default settings?  Did you test 
> > the compile time implications of increasing the lookahead? (value of 8 is 
> > very large, we may consider enbling it only for -Ofast, limiting for 
> > postreload only or something similar).
> 
> The improvement is seen with the options "-fschedule-insns  -fschedule-insns2 
> -fsched-pressure"
> 
> Below are the build times of some of the SPEC benchmarks
> 
>                   dfa8       no_lookahead
> 
> perlbench       - 196s          193s
> bzip2           - 19s           19s
> gcc             - 439s          429s
> mcf             - 3s            3s
> gobmk           - 119s          115s
> hmmer           - 62s           60s
> sjeng           - 18s           17s
> libquantum      - 6s            6s
> h264ref         - 110s          107s
> omnetpp         - 132s          128s
> astar           - 7s            7s
> bwaves          - 4s            5s
> gamess          - 1996s         1957s
> milc            - 18s           18s
> GemsFDTD        - 276s          272s
> 
> I think we can enable it by default rather than for -Ofast.
> Please let me know your inputs.

OK, so it is about 2%.  Did you try if you need lookahead even in the early 
pass (before reload)?  My guess would be so, but if not, it could cut the cost 
to half.  For -Ofast/-O3 it looks resonable to me, but we will need to announce 
it on the ML.  For other settings I think we need to work on more improvmeents 
or cut the expenses.

Honza
> 
> Regards
> Ganesh
> 
> -----Original Message-----
> From: Jan Hubicka [mailto:hubi...@ucw.cz]
> Sent: Thursday, October 24, 2013 2:54 PM
> To: Gopalasubramanian, Ganesh
> Cc: gcc-patches@gcc.gnu.org; Uros Bizjak (ubiz...@gmail.com); 
> hubi...@ucw.cz; H.J. Lu (hjl.to...@gmail.com)
> Subject: Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for 
> modern x86 chips
> 
> > Attached is the patch which does the following scheduler related changes.
> > * re-models bdver3 decoder.
> > * It enables lookahead with value 8 for all BD architectures. The patch 
> > doesn't consider if reloading is completed or not (an area that needs to be 
> > worked on).
> > * The issue rate for BD architectures are set to 4.
> > 
> > I see the following performance improvements on bdver3 machine.
> > * GemsFDTD improves by 6-7% with lookahead value changed to 8.
> > * Hmmer improves by 9% when issue rate when set to 4 .
> 
> Is this with -fschedule-insns? Or only with default settings?  Did you test 
> the compile time implications of increasing the lookahead? (value of 8 is 
> very large, we may consider enbling it only for -Ofast, limiting for 
> postreload only or something similar).
> 
> > 
> > I have considered the following hardware details for the model.
> > * There are four decoders inside a hardware decoder block.
> > * These four independent decoders can execute in parallel.  (They can take 
> > 8B from four different instructions and decode).
> > * These four decoders are pipelined 4 cycles deep and are non-stalling.
> > * Each decoder takes 8B of instruction data every cycle and tries decoding 
> > it. 
> > * Issue rate is 4.
> What is the overall limitation on number of bytes the instructions can occupy?
> I think they need to fit into 2 16 byte windows, right?
> In that case we may want to tweak the existing corei7 scheduling code to take 
> care of this.  Making scheduler not overly optimistic about the parallelism 
> is good since it will make less register pressure during the first pass..
> > 
> > Is it OK for upstream?
> 
> Otherwise the patch seems OK, but I would like to know the compile time 
> effect first.
> 
> Honza
> > 
> > Changelog
> > ========
> > 2013-10-24  Ganesh Gopalasubramanian 
> > <ganesh.gopalasubraman...@amd.com>
> > 
> >     * config/i386/bdver3.md : Added two additional decoder units 
> >     to support issue rate of 4 and remodeled vector unit.
> > 
> >     * config/i386/i386.c (ix86_issue_rate): Issue rate for BD
> >     architectures is set to 4.
> > 
> >     * config/i386/i386.c (ia32_multipass_dfa_lookahead): DFA
> >     lookahead is set to 8 for BD architectures.
> > 
> > Regards
> > Ganesh
> > 
> 
> 
> 


Reply via email to