Hi Uros!

Attached is the revised patch.
The target independent part has been already approved and added.

This revision of the patch adds a x86 tune definition and checks it while 
deciding the unroll factor.

Accommodated the comments given by you except one.

> *x will never be null for active insns.
Since every rtx in the insn is checked for memory references, the NULL_RTX 
check is required.

Regards
Ganesh

-----Original Message-----
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Friday, November 22, 2013 1:46 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther <richard.guent...@gmail.com> 
(richard.guent...@gmail.com); borntrae...@de.ibm.com; H.J. Lu 
(hjl.to...@gmail.com); Jakub Jelinek (ja...@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh 
<ganesh.gopalasubraman...@amd.com> wrote:

> Steamroller processors contain a loop predictor and a loop buffer, which may 
> make unrolling small loops less important.
> When unrolling small loops for steamroller, making the unrolled loop fit in 
> the loop buffer should be a priority.
>
> This patch uses a heuristic approach (number of memory references) to decide 
> the unrolling factor for small loops.
> This patch has some noise in SPEC 2006 results.
>
> Bootstrapping passes.
>
> I would like to know your comments before committing.

Please split the patch to target-dependant and target-independant part, and get 
target-idependant part reviewed first.

This part:

+  if (ix86_tune != PROCESSOR_BDVER3 && ix86_tune != PROCESSOR_BDVER4)  
+ {
+    return nunroll;
+  }

is wrong. You should introduce tune variable (as H.J. suggested) and check that 
variable here. Target dependant tuning options should be in x86-tune.def, so 
everything regarding tuning can be found in one place.

+        if (INSN_P (insn) && INSN_CODE (insn) != -1)
+            for_each_rtx (&insn, (rtx_function) ix86_loop_memcount,
&mem_count);

if (NONDEBUG_INSN_P (insn))
  for_each_rtx (&PATTERN(insn), ...);

otherwise your heuristics will depend on -g compile option.

+  if ( (mem_count*nunroll) <= 32)

Extra parenthesis.

+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count) {
+  if (*x != NULL_RTX && MEM_P (*x))

*x will never be null for active insns.

Uros.

Attachment: unroll-adjust.patch
Description: unroll-adjust.patch

Reply via email to