Hi Mike,

Thanks for your comments.
Please find my comments inlined.

- Thanks and regards,
  Sameera D.

On Monday 11 May 2015 10:09 PM, Mike Stump wrote:
On May 11, 2015, at 4:05 AM, sameera <sameera.deshpa...@imgtec.com> wrote:
+(define_insn "*join2_loadhi"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+       (any_extend:SI (match_operand:HI 1 "non_volatile_mem_operand" "m")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+       (any_extend:SI (match_operand:HI 3 "non_volatile_mem_operand" "m")))]
+  "ENABLE_LD_ST_PAIRS && reload_completed"
+  {
+    /* Reg-renaming pass reuses base register if it is dead after bonded loads.
+       Hardware does not bond those loads, even when they are consecutive.
+       However, order of the loads need to be checked for correctness.  */
+    if (!reg_overlap_mentioned_p (operands[0], operands[1]))
+      {
+       output_asm_insn ("lh<u>\t%0,%1", operands);
+       output_asm_insn ("lh<u>\t%2,%3", operands);
+      }
+    else
+      {
+       output_asm_insn ("lh<u>\t%2,%3", operands);
+       output_asm_insn ("lh<u>\t%0,%1", operands);
+      }
+
+    return "";
+  }
+  [(set_attr "move_type" "load")
+   (set_attr "insn_count" "2")])

However, unlike other architectures, we do not generate single instruction for 
bonded pair,

Actually, you do.  The above is 1 instruction pattern.  Doesn’t matter much 
what it prints as or what the CPU thinks of it.
The pattern is single, however, the asm code will have multiple instructions 
generated for the pattern.

because of which it is difficult to check if bonding is happening or not. 
Hence, an assembly file is generated with debug dumps, and the bonded 
loads/stores are identified by their pattern names.

Nothing wrong with that approach.  Also, in the assembly, one can look for 
sequences of instruction if they way.
Load/store bonding is not just contiguous load/store instructions, but they also need to have same base register and offset with specific difference. Hence, The way you suggested might not be useful always. Hence, I am comparing the pattern name instead.
See gcc/testsuite/gcc.target/aarch64/fuse_adrp_add_1.c:

   /* { dg-final { scan-assembler "adrp\tx.*, fixed_regs\n\tadd\tx.*, 
x.*fixed_regs" } } */

in the test suite for example.

I am trying FUSION for MIPS as suggested by Mike, and testing the perf impact 
of it along with other mips specific options.

I think you will discover it is virtually what you have now, and works better.  
The fusion just can peephole over greater distances, that’s the only real 
difference.
Yes, in many cases I see clear improvement. However, it also tries to bring loads/stores together, which were split intentionally by msched-weight option, introduced for MIPS. I need to measure performance and do perf tuning (if needed) for that option before sending it for review.

Reply via email to