Hi Uros! Attached is the revised patch. The target independent part has been already approved and added.
This revision of the patch adds a x86 tune definition and checks it while deciding the unroll factor. Accommodated the comments given by you except one. > *x will never be null for active insns. Since every rtx in the insn is checked for memory references, the NULL_RTX check is required. Regards Ganesh -----Original Message----- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Friday, November 22, 2013 1:46 PM To: Gopalasubramanian, Ganesh Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guent...@gmail.com> (richard.guent...@gmail.com); borntrae...@de.ibm.com; H.J. Lu (hjl.to...@gmail.com); Jakub Jelinek (ja...@redhat.com) Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4 On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh <ganesh.gopalasubraman...@amd.com> wrote: > Steamroller processors contain a loop predictor and a loop buffer, which may > make unrolling small loops less important. > When unrolling small loops for steamroller, making the unrolled loop fit in > the loop buffer should be a priority. > > This patch uses a heuristic approach (number of memory references) to decide > the unrolling factor for small loops. > This patch has some noise in SPEC 2006 results. > > Bootstrapping passes. > > I would like to know your comments before committing. Please split the patch to target-dependant and target-independant part, and get target-idependant part reviewed first. This part: + if (ix86_tune != PROCESSOR_BDVER3 && ix86_tune != PROCESSOR_BDVER4) + { + return nunroll; + } is wrong. You should introduce tune variable (as H.J. suggested) and check that variable here. Target dependant tuning options should be in x86-tune.def, so everything regarding tuning can be found in one place. + if (INSN_P (insn) && INSN_CODE (insn) != -1) + for_each_rtx (&insn, (rtx_function) ix86_loop_memcount, &mem_count); if (NONDEBUG_INSN_P (insn)) for_each_rtx (&PATTERN(insn), ...); otherwise your heuristics will depend on -g compile option. + if ( (mem_count*nunroll) <= 32) Extra parenthesis. +static int +ix86_loop_memcount (rtx *x, unsigned *mem_count) { + if (*x != NULL_RTX && MEM_P (*x)) *x will never be null for active insns. Uros.
unroll-adjust.patch
Description: unroll-adjust.patch