https://bugs.kde.org/show_bug.cgi?id=362329
--- Comment #5 from Julian Seward <jsew...@acm.org> --- (In reply to Carl Love from comment #4) > Created attachment 98666 [details] > Patch 3 of 5 to add VEX support for the POWER ISA 3.0 instructions Ok to land, but I would prefer you fix the comments below first, especially about the IRExpr* vs IRTemp duplication, since that potentially can cause excessively long and slow code to get generated. diff --git a/VEX/priv/guest_ppc_toIR.c b/VEX/priv/guest_ppc_toIR.c index 44304df..e93831c 100644 +static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, + IRExpr *dnorm, IRExpr *pos) +static IRExpr * create_DCM ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, + IRExpr *dnorm, IRExpr *pos) This gives a lot of duplication given that they produce identical trees except for the types. You can parameterise it like this: static IRExpr* create_DCM ( IROp opOR, IROp opAND, IROp opSHL, IRExpr *NaN, IRExpr *inf, IRExpr *zero, IRExpr *dnorm, IRExpr *pos ) .. return binop(opOR, binop(opSHL, NaN, mkU8(6)), etc static IRExpr* create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, IRExpr *dnorm, IRExpr *pos ) { return create_DCM(Iop_Or32, Iop_And32, Iop_Shl32, NaN, inf, zero, dnorm, pos); } and similar for create_DCM_64 Another more general point is that you are passing in IRExpr*s, which you then bake into bigger trees: +static IRExpr * create_DCM_32 ( IRExpr *NaN, IRExpr *inf, IRExpr *zero, + IRExpr *dnorm, IRExpr *pos) You're using inf, zero, pos and neg twice, which means that you will force the back end to generate code for them twice, and so they will actually get computed twice. If you're lucky, the IR optimiser (ir_opt.c) will come along and common up that duplication (CSE pass), but CSEing is expensive and so only happens for a minority of IRSBs and these are relatively unlikely to qualify. A way to avoid this is to bind those values to IRTemps and use the temps instead: assign(tNaN, NaN); assign(tInf, inf); etc create_DCM_32( tNaN, tInf, ...) Binding a value tree to an IRTemp forces the back end to generate code that computes the value into a register, so you can then use the IRTemp as many times as you like without any danger of duplicating computation since it will just repeatedly read the (back-end) register that holds the value. This is particularly relevant for (eg) Quad_precision_gt since you are going to duplicate the src_A and src_B computations many times there. +static IRExpr * is_Zero_V128 ( IRTemp src ) + return mkAND1( binop( Iop_CmpEQ64, hi64, mkU64( 0 ) ), + binop( Iop_CmpEQ64, low64, mkU64( 0 ) ) ); binop(Iop_CmpEQ64, binop(Iop_Or64, hi64, low64), mkU64(0)) avoids mkAND1 since mkAND1/mkOR1/mkNOT1 and generate poor code -- the back end is not very clever with them. It could do better. -- You are receiving this mail because: You are watching all bug changes.