[valgrind] [Bug 385409] s390x: z13 vector integer instructions not implemented

Andreas Arnez Tue, 25 Sep 2018 10:25:59 -0700

https://bugs.kde.org/show_bug.cgi?id=385409


--- Comment #72 from Andreas Arnez <ar...@linux.ibm.com> ---
(In reply to Julian Seward from comment #71)
> Thank you for finishing this up.  OK to land provided the stuff below
> is fixed (none of it is a big deal) and the test in the next para
> passes.
Thanks for reviewing!

> It is also important to check that all new or modified test cases run
> on Memcheck without asserting or otherwise failing.  The regression
> test driver will only check them with the "none" tool since the tests
> are inside none/tests/s390.  You can do this manually by running the
> relevant tests directly on memcheck (just run "by hand").
Done.

> [...]
> +typedef union {
> ..
> +} s390x_vec_op_details_t;
> 
> add STATIC_ASSERT(sizeof(s390x_vec_op_details_t) == sizeof(ULong))
Done.

> +static void
> +s390_cc_set_expr(IRExpr* cc)
> +{
> +   vassert(typeOfIRExpr(irsb->tyenv, cc) == Ity_I64);
> +
> +   s390_cc_thunk_fill(mkU64(S390_CC_OP_SET), cc, mkU64(0), mkU64(0));
> +}
> 
> Similar to conversation last week this is again a case where, if |cc|
> is used more than once (maybe in the callee of this function) then
> code for it will get generated to evaluate it multiple times, and it's
> quite likely that those duplicates will not get removed during IR
> optimisation.  It would be safer to pass around an IRTemp and use
> |mkexpr(cc)| instead.
Done.  Most of the time there was an IRTemp already, so it wasn't a real
problem.  But I agree we're on the safe side this way.  Also renamed the
function to s390_cc_set and the previously named s390_cc_set to
s390_cc_set_val.

> +static IRExpr*
> +s390_V128_compareLT128x1(IRExpr* arg1, IRExpr* arg2, Bool allow_equal)
> 
> Just a sanity check: is the logic with |allow_equal| here correct?  I
> am not saying it isn't, but I think a second look would not be a bad
> idea.
Yes, the logic looks correct to me: `allow_equal' is checked when the
high halves are equal, because in that case the operands might be fully
equal.  Otherwise they certainly differ and `allow_equal' doesn't
matter.

> +/* Permorms "arg1 + arg2 + carry_out_bit(arg1 + arg2)".
> 
> "Performs"
Done.

> [...] have you checked that s390_disasm can print all of these new
> insns?  [...]
I haven't checked whether all insns can be printed correctly.  The ones
I looked at were OK.  I will look into that more thoroughly tomorrow.

> +   const IROp ops[] = { Iop_InterleaveHI8x16, Iop_InterleaveHI16x8,
> +                        Iop_InterleaveHI32x4, Iop_InterleaveHI64x2 };
> +   vassert(m4 < sizeof(ops));
> +   put_vr_qw(v1, binop(ops[m4], get_vr_qw(v2), get_vr_qw(v3)));
> 
> That assertion is surely not right [...]
> 
> Please fix, also any other similar that you can find in this diff.
> [...]
Hm, yeah, this is broken -- luckily it doesn't impact the execution of
correct code.  Fixed all occurrences I found.

> static const HChar *
>  s390_irgen_VSEL(UChar v1, UChar v2, UChar v3, UChar v4)
>  {
> -   IRExpr* vA = get_vr_qw(v3);
> -   IRExpr* vB = get_vr_qw(v2);
> -   IRExpr* vC = get_vr_qw(v4);
> -
> -   /* result = (vA & ~vC) | (vB & vC) */
> -   put_vr_qw(v1,
> -             binop(Iop_OrV128,
> -                   binop(Iop_AndV128, vA, unop(Iop_NotV128, vC)),
> -                   binop(Iop_AndV128, vB, vC)
> -                  )
> -            );
> +   IRExpr* vIfTrue = get_vr_qw(v2);
> +   IRExpr* vIfFalse = get_vr_qw(v3);
> +   IRExpr* vCond = get_vr_qw(v4);
> +
> +   put_vr_qw(v1, s390_V128_bitwiseITE(vCond, vIfTrue, vIfFalse));
>     return "vsel";
>  }
> 
> This kind of thing potentially also suffers from the IRExpr*
> duplication problem, assuming that s390_V128_bitwiseITE uses vCond
> twice.  You could fix it directly in s390_V128_bitwiseITE by
> allocating a new temp, binding (assign()-ing) vCond to that and then
> using the temp twice (if it doesn't already do that).
Done.

> --- a/VEX/priv/host_s390_defs.h
> +++ b/VEX/priv/host_s390_defs.h
> 
> +   S390_VEC_COUNT_LEADING_ZEROEZ,
> +   S390_VEC_COUNT_TRAILING_ZEROEZ,
> 
> Really, _ZEROEZ and not _ZEROES ?  Can we use the traditional
> spelling, unless ZEROEZ is really what was intended?
Hehe, these were cool names though ;-).  Anyhow -- fixed.

> --- a/VEX/priv/host_s390_isel.c
> +++ b/VEX/priv/host_s390_isel.c
> 
> +#define IRCONST_IS_EQUAL_U8(arg, val) \
> +   ( ((arg)->tag == Iex_Const) && ((arg)->Iex.Const.con->Ico.U8 == (val)) )
> 
> This misses out (or assumes) the check that |arg| has type Ity_I8.
> This might work, but it's somewhat dangerous in the case of
> programming mistakes.  Please change it to be
> 
>   ((arg)->tag == Iex_Const)
>   && ((arg)->Iex.Const.tag == Ico_U8)  <--- the missing check
>   && ((arg)->Iex.Const.con->Ico.U8 == (val))
Done.

> -      s390_unop_t vec_op = 0;
> +      UInt vec_op = 0;
> 
> Is it necessary to "weaken" the type of vec_op here?
Reverted, and also removed the dubious initialization.  The compiler
doesn't complain, so the weakening probably wasn't necessary to please
the compiler, and I haven't found any other reason.

> +      case Iop_CmpNEZ128x1: {
> [...]
> +      }
> 
> This is (I think!) correct but it's also very inefficient.  We're
> given a 128 bit value |arg| and we want to make |dst| be 0..(126)..0
> if |arg| is |all| zeroes and 1..(126)..1 if arg is all ones.
> 
> Better would be like this
> 
>    IRExpr* low64     = IRExpr_Unop(Iop_V128to64, arg);
>    IRExpr* high64    = IRExpr_Unop(Iop_V128HIto64, arg);
>    IRExpr* both      = IRExpr_Binop(Iop_Or64, low64, high64);
>    IRExpr* anyNonZ   = IRExpr_Unop(Iop_CmpNEZ64, both);
>    IRExpr* anyNonZ64 = IRExpr_Unop(Iop_1Sto64, anyNonZ);
>    reg1 = s390_isel_int_expr(env, anyNonZ64);
Done.

> 
> Even then, it's pretty much a hack to create s390 code by first
> generating *even more* IR and then handing that off to some other part
> of the instruction selector.  It also means that the tree for |arg|
> will get instruction-selected twice.
> 
> Much better would just be to generate the relevant s390 insns
> directly.  But fixing that is beyond the scope of this round of
> reviewing/bug fixing.
Yeah, I didn't spend time on that yet.

> +      case Iop_PwAddL8Ux16: {
> [...]
> +      }
> 
> ditto (re "pretty much a hack".)  At least, please format to fit
> inside 80 cols.
Done.

> Please write the arguments to IRConst_V128 here as 0x0000.  This is
> consistent with the style of other use of it and emphasises the fact
> that the argument is a 16-bit value (one bit for each byte of a 16
> byte vector).
Done.

> +      case Iop_PwAddL16Ux8:
> [...]
> 
> Please change to use the house style, not so verbose:
> 
> +      case Iop_PwAddL16Ux8:
> +         if (arg->tag == Iex_Unop && arg->Iex.Unop.op == Iop_PwAddL8Ux16) {
> +            size = 1;
> +            arg = arg->Iex.Unop.arg;
> +         } else {
> +            size = 2;
> +         }
> +         vec_op = S390_VEC_PWSUM_W;
> +         goto Iop_Pairwise_wrk;
Done.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 385409] s390x: z13 vector integer instructions not implemented

Reply via email to