> Hi, > > > Is this with -fschedule-insns? Or only with default settings? Did you test > > the compile time implications of increasing the lookahead? (value of 8 is > > very large, we may consider enbling it only for -Ofast, limiting for > > postreload only or something similar). > > The improvement is seen with the options "-fschedule-insns -fschedule-insns2 > -fsched-pressure" > > Below are the build times of some of the SPEC benchmarks > > dfa8 no_lookahead > > perlbench - 196s 193s > bzip2 - 19s 19s > gcc - 439s 429s > mcf - 3s 3s > gobmk - 119s 115s > hmmer - 62s 60s > sjeng - 18s 17s > libquantum - 6s 6s > h264ref - 110s 107s > omnetpp - 132s 128s > astar - 7s 7s > bwaves - 4s 5s > gamess - 1996s 1957s > milc - 18s 18s > GemsFDTD - 276s 272s > > I think we can enable it by default rather than for -Ofast. > Please let me know your inputs.
OK, so it is about 2%. Did you try if you need lookahead even in the early pass (before reload)? My guess would be so, but if not, it could cut the cost to half. For -Ofast/-O3 it looks resonable to me, but we will need to announce it on the ML. For other settings I think we need to work on more improvmeents or cut the expenses. Honza > > Regards > Ganesh > > -----Original Message----- > From: Jan Hubicka [mailto:hubi...@ucw.cz] > Sent: Thursday, October 24, 2013 2:54 PM > To: Gopalasubramanian, Ganesh > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak (ubiz...@gmail.com); hubi...@ucw.cz; > H.J. Lu (hjl.to...@gmail.com) > Subject: Re: Fix scheduler ix86_issue_rate and ix86_adjust_cost for modern > x86 chips > > > Attached is the patch which does the following scheduler related changes. > > * re-models bdver3 decoder. > > * It enables lookahead with value 8 for all BD architectures. The patch > > doesn't consider if reloading is completed or not (an area that needs to be > > worked on). > > * The issue rate for BD architectures are set to 4. > > > > I see the following performance improvements on bdver3 machine. > > * GemsFDTD improves by 6-7% with lookahead value changed to 8. > > * Hmmer improves by 9% when issue rate when set to 4 . > > Is this with -fschedule-insns? Or only with default settings? Did you test > the compile time implications of increasing the lookahead? (value of 8 is > very large, we may consider enbling it only for -Ofast, limiting for > postreload only or something similar). > > > > > I have considered the following hardware details for the model. > > * There are four decoders inside a hardware decoder block. > > * These four independent decoders can execute in parallel. (They can take > > 8B from four different instructions and decode). > > * These four decoders are pipelined 4 cycles deep and are non-stalling. > > * Each decoder takes 8B of instruction data every cycle and tries decoding > > it. > > * Issue rate is 4. > What is the overall limitation on number of bytes the instructions can occupy? > I think they need to fit into 2 16 byte windows, right? > In that case we may want to tweak the existing corei7 scheduling code to take > care of this. Making scheduler not overly optimistic about the parallelism > is good since it will make less register pressure during the first pass.. > > > > Is it OK for upstream? > > Otherwise the patch seems OK, but I would like to know the compile time > effect first. > > Honza > > > > Changelog > > ======== > > 2013-10-24 Ganesh Gopalasubramanian > > <ganesh.gopalasubraman...@amd.com> > > > > * config/i386/bdver3.md : Added two additional decoder units > > to support issue rate of 4 and remodeled vector unit. > > > > * config/i386/i386.c (ix86_issue_rate): Issue rate for BD > > architectures is set to 4. > > > > * config/i386/i386.c (ia32_multipass_dfa_lookahead): DFA > > lookahead is set to 8 for BD architectures. > > > > Regards > > Ganesh > > > > >