[valgrind] [Bug 362329] Valgrind does not support the IBM POWER ISA 3.0 instructions, part 3

Julian Seward via KDE Bugzilla Wed, 18 May 2016 01:58:17 -0700

https://bugs.kde.org/show_bug.cgi?id=362329


--- Comment #5 from Julian Seward <jsew...@acm.org> ---
(In reply to Carl Love from comment #4)
> Created attachment 98666 [details]
> Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions


Ok to land, but I would prefer you fix the comments below first,
especially about the IRExpr* vs IRTemp duplication, since that
potentially can cause excessively long and slow code to get generated.



diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c
index 44304df..e93831c 100644

+static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
+                                IRExpr *dnorm, IRExpr *pos)

+static IRExpr * create_DCM ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
+                             IRExpr *dnorm, IRExpr *pos)

This gives a lot of duplication given that they produce identical trees
except for the types.  You can parameterise it like this:

  static IRExpr* create_DCM ( IROp opOR, IROp opAND, IROp opSHL,
                              IRExpr *NaN, IRExpr *inf, IRExpr *zero,
                              IRExpr *dnorm, IRExpr *pos )
  ..
  return binop(opOR,
               binop(opSHL, NaN, mkU8(6)),
  etc

  static IRExpr* create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
                                 IRExpr *dnorm, IRExpr *pos ) {
     return create_DCM(Iop_Or32, Iop_And32, Iop_Shl32,
                       NaN, inf, zero, dnorm, pos);
  }

  and similar for create_DCM_64



Another more general point is that you are passing in IRExpr*s, which
you then bake into bigger trees:

+static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero,
+                                IRExpr *dnorm, IRExpr *pos)

You're using inf, zero, pos and neg twice, which means that you will
force the back end to generate code for them twice, and so they will
actually get computed twice.  If you're lucky, the IR optimiser
(ir_opt.c) will come along and common up that duplication (CSE pass),
but CSEing is expensive and so only happens for a minority of IRSBs
and these are relatively unlikely to qualify.

A way to avoid this is to bind those values to IRTemps and use the
temps instead:

   assign(tNaN, NaN);
   assign(tInf, inf);
   etc
   create_DCM_32( tNaN, tInf, ...)

Binding a value tree to an IRTemp forces the back end to generate code
that computes the value into a register, so you can then use the IRTemp
as many times as you like without any danger of duplicating computation
since it will just repeatedly read the (back-end) register that holds
the value.

This is particularly relevant for (eg) Quad_precision_gt since you are
going to duplicate the src_A and src_B computations many times there.




+static IRExpr * is_Zero_V128 ( IRTemp src )
+   return mkAND1( binop( Iop_CmpEQ64, hi64, mkU64( 0 ) ),
+                  binop( Iop_CmpEQ64, low64, mkU64( 0 ) ) );

binop(Iop_CmpEQ64, binop(Iop_Or64, hi64, low64), mkU64(0))
avoids mkAND1 since mkAND1/mkOR1/mkNOT1 and generate poor code -- the
back end is not very clever with them.  It could do better.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 362329] Valgrind does not support the IBM POWER ISA 3.0 instructions, part 3

Reply via email to