https://bugs.kde.org/show_bug.cgi?id=383010
--- Comment #12 from Julian Seward <jsew...@acm.org> --- (In reply to Tanya from comment #5) > Created attachment 109000 [details] > Updated AVX-512 implementation prototype This looks pretty good. I have a number of comments, many of them just style/layout, but a few are a bit more than that, but nothing serious. As a side note -- before this lands, I would want to do some performance runs to check that this doesn't impact performance (or correctness) of existing IA support. ----- static +Bool have_avx512(void) { return (vai.hwcaps & VEX_HWCAPS_AMD64_AVX512 ? True : False); Potential operator precedence problelsm for & vs ?: make me nervous. Please use (vai.hwcaps & VEX_HWCAPS_AMD64_AVX512) ? True : False so we don't accidentally end up with vai.hwcaps & (VEX_HWCAPS_AMD64_AVX512 ? True : False) ----- valgrind-wip/valgrind/memcheck/mc_machine.c @@ -873,7 +888,6 @@ if (o == GOF(FPSCR) && sz == 4) return -1; if (o == GOF(TPIDRURO) && sz == 4) return -1; - if (o == GOF(TPIDRURW) && sz == 4) return -1; if (o == GOF(ITSTATE) && sz == 4) return -1; Looks like you need to update your -wip copy. The TPIDRURW line was added recently for ARM. ----- void mc_LOADV_128_or_256_slow ( /*OUT*/ULong* res, Addr a, SizeT nBits, Bool bigendian ) { - ULong pessim[4]; /* only used when p-l-ok=yes */ + ULong pessim[16]; /* only used when p-l-ok=yes */ IIUC, you're now using this also to handle the slow case for 512 bit loads. Correct? If so, (1) please rename it to mc_LOADV_128_or_256_or_512_slow (2) please verify that the 4->16 transition here is correct -- I suspect it is not. A ULong is 64 bits, so 'ULong pessim[4]' contains one byte for each byte in a 256-bit load. So to make it work for 512 bit loads, don't you only need to change it to 'ULong pessim[8]' ? ----- +void amd64g_dirtyhelper_CPUID_avx512 ( VexGuestAMD64State* st ) { Please add a comment saying which processor this is taken from (or most closely represents), like with all the other CPUID helper functions. ----- +#define OFFB_YMM32 offsetof(VexGuestAMD64State,guest_ZMM32) //NULL What does //NULL here mean? Maybe remove it? ----- +#define PFX_EVEXb (1<<30) /* EVEX b bit, if EVEX present, else 0 */ +#define PFX_EVEXTT (0xF<<32) /* EVEX tuple type (4-bits) if EVEX present, else 0 */ Please change these (also, all the pre-existing PFX_ values like PFX_ASO etc) like this 1ULL << 30 0xFULL << 32 so that we don't get problems with (eg) the compiler shifting 0xF as 32-bits left by 32 (giving zero) and *then* widening to 64 bits. ----- +static Int tuple_disp_map[14][3] = { I assume that you want to access this using a TupleType for the first index. Please add this: STATIC_ASSERT(FullVectorMask == 14) just before the definition of tuple_disp_map. ----- +static int getMult(Prefix pfx) { + int type = getEvexType(pfx); Please, use house types (Int instead of int). Int is signed-32 on all platforms that valgrind supports. ----- I'm not sure what function these are in, but anyway .. + + // High 256 bits of ZMM0-ZMM15 registers + for (reg = 0; reg < 16; reg++) { + stmt( IRStmt_StoreG( + Iend_LE, + binop(Iop_Add64, mkexpr(addr), mkU64(1152 + reg * 32)), and later + binop(Iop_Add64, mkexpr(addr), mkU64(1664 + lane*16 + reg*64)), Please don't use magic numbers 1152 and 1664. Where do these come from? Are they guest-state offsets? If yes, please make them be derived from OFFB_* values. If no, please at least document where the numbers come from and what else assumes those same numbers. ----- IRExpr* ea = binop(Iop_Add64, mkexpr(addr), mkU64(576 + reg * 16)); same + IRExpr* ea = binop(Iop_Add64, mkexpr(addr), mkU64(1152 + reg * 32)); etc ----- +static IRExpr* mask_expr(Prefix pfx, IRExpr* unmasked, IRExpr* original) { Nit: for big functions like this, please place the { on its own line. In the same function: + default: + break; If it is an internal logic bug that we get to this point, it would be better to 'vassert(0)'. But only if it's a bug in the code -- not if it is an undecoded instruction. ----- + assign( e1, amt >= size + ? mkV512(0) + : binop(op, mkexpr(e0), mkU8(amt)) + ); Please fix the indentation here and below, to make it clearer that there are 2 args. (In emacs C-mode, pressing TAB often fixes it for me). + assign( e1, amt >= size + ? mkV512(0) + : binop(op, mkexpr(e0), mkU8(amt)) + ); ----- + if (!getEvexMask(pfx)) { putYMMRegLoAndZU( rG, unop(op, mkexpr(arg)) ); + } Indent the then-clause! ----- +static +Long dis_AVX512_cmp_V_E_to_k ( /*OUT*/Bool* uses_vvvv, + IRTemp addr = IRTemp_INVALID; + Int alen = 0; + HChar dis_buf[50]; + UInt mask = getEvexMask(pfx); + UChar modrm = getUChar(delta); + UInt rG = gregOfRexRM(pfx, modrm); + UInt rV = getVexNvvvv(pfx); + *uses_vvvv = True; + IRTemp tD = newTemp(Ity_V512); + IRExpr* unmasked; + UInt imm8; Please line up the declarationdb like in many other cases, to make it easier to read. ----- +static Long opmask_operation_decode (const VexAbiInfo* vbi, Prefix pfx, Long* delta) { { on its own line ----- +static IRTemp math_VPERMD_512 ( IRTemp ctrlV, IRTemp dataV ) { + /* In the control vector, zero out all but the bottom four bits of each 32-bit lane. */ Please stay within an 80 column limit. ----- + if (!isYMM) { putYMMRegLane128( rG, 1, mkV128(0) ); + } Indent! ----- @@ -30062,170 +31359,2467 @@ nameIRegG(size,pfx,rm)); delta += alen; } - - /* First mask off bits not set in mask, they are ignored - and it should be fine if they contain undefined values. */ - IRExpr* masked = binop(mkSizedOp(ty,Iop_And8), - mkexpr(src), mkexpr(mask)); - IRExpr** args = mkIRExprVec_2( widenUto64(masked), - widenUto64(mkexpr(mask)) ); - putIRegG( size, pfx, rm, - narrowTo(ty, mkIRExprCCall(Ity_I64, 0/*regparms*/, - "amd64g_calculate_pext", - &amd64g_calculate_pext, args)) ); - *uses_vvvv = True; - /* Flags aren't modified. */ - goto decode_success; + + /* First mask off bits not set in mask, they are ignored + and it should be fine if they contain undefined values. */ + IRExpr* masked = binop(mkSizedOp(ty,Iop_And8), + mkexpr(src), mkexpr(mask)); + IRExpr** args = mkIRExprVec_2( widenUto64(masked), + widenUto64(mkexpr(mask)) ); + putIRegG( size, pfx, rm, + narrowTo(ty, mkIRExprCCall(Ity_I64, 0/*regparms*/, + "amd64g_calculate_pext", + &amd64g_calculate_pext, args)) ); + *uses_vvvv = True; + /* Flags aren't modified. */ + goto decode_success; What actually changed here? I can't see any difference. If it's just whitespace, don't include it. ------ +#define MULT512 4 I see some uses of MULT512, but it's unclear what it means. Please add a comment. (Later) I see you're using it to mean 'the number of 128 bit registers into which an Ity_I512 expression is computed'. Is there a better name for this? I couldn't have guessed the meaning just from the existing name. ------ static HReg iselIntExpr_R_wrk ( ISelEnv* env, const IRExpr* e ) + Int octet = e->Iex.Unop.op - Iop_V512to64_0; + vec = temp[octet/2]; Use 'UInt octet' so that we're sure the division just turns into a shift. ----- @@ -2669,6 +2793,49 @@ return dst; } + if (e->tag == Iex_Unop && e->Iex.Unop.op == Iop_ExtractExpF32) { + HReg dst = newVRegV(env); + HReg arg = iselFltExpr(env, e->Iex.Unop.arg); + sub_from_rsp(env, 32); + addInstr(env, AMD64Instr_Lea64(AMD64AMode_IR(0, hregAMD64_RSP()), hregAMD64_RDI())); + addInstr(env, AMD64Instr_Lea64(AMD64AMode_IR(4, hregAMD64_RSP()), hregAMD64_RSI())); It was a bit unclear to me what you're using the RDI value for. I had to look at this a couple of times to see that it is the first (only?) arg to h_generic_calc_GetExp32. Please add a short comment saying that; also for the clauses that follow (calls to h_generic_calc_GetMant32 and h_generic_calc_RoundScaleF32) ----- static void iselVecExpr_wrk_512 ( /*OUT*/ HReg *dst, ISelEnv* env, IRExpr* e ) { { on its own line ----- -- valgrind-clean/valgrind/VEX/priv/host_generic_fixup.c 1969-12-31 18:00:00.000000000 -0600 +++ valgrind-wip/valgrind/VEX/priv/host_generic_fixup.c 2017-11-17 12:26:07.224622000 -0600 I see that you need a new source file here (fine), but (1) "fixup" is a pretty meaningless name -- it could mean anything. Please choose something that describes better what it does, eg host_generic_avx512_helpers.c (2) It needs a copyright notice. Copy from any other file (eg host_generic_simd256.c and s/OpenWorks etc/Intel (or whoever)) (3) Start/end comment blocks in the file would be nice. ----- --- valgrind-clean/valgrind/VEX/priv/host_generic_simd512.c 1969-12-31 18:00:00.000000000 -0600 +++ valgrind-wip/valgrind/VEX/priv/host_generic_simd512.c 2017-11-17 12:21:00.513156000 -0600 In the copyright, s/OpenWorks GBR/Intel (or whoever) Please use Int instead of int in this file. ----- --- valgrind-clean/valgrind/VEX/priv/host_generic_simd512.h 1969-12-31 18:00:00.000000000 -0600 +++ valgrind-wip/valgrind/VEX/priv/host_generic_simd512.h 2017-11-17 12:21:00.519156000 -0600 Change copyright notice name. ----- --- valgrind-clean/valgrind/VEX/pub/libvex_basictypes.h 2017-11-21 07:10:54.339770000 -0600 +++ valgrind-wip/valgrind/VEX/pub/libvex_basictypes.h 2017-11-14 09:46:54.822380000 -0600 @@ -71,6 +71,13 @@ /* Always 256 bits. */ typedef UInt U256[8]; +/* Always 512 bits. */ +typedef UInt U512[16]; + +/* Floating point. */ +typedef float Float; /* IEEE754 single-precision (32-bit) value */ +typedef double Double; /* IEEE754 double-precision (64-bit) value */ + Please add (at this point) STATIC_ASSERT(sizeof(Float) == 4) STATIC_ASSERT(sizeof(Double) == 8) (Call me paranoid. I don't care :-) ----- UInt V256; /* 32-bit value; see Ico_V256 comment above */ + ULong V512; /* 64-bit value; see Ico_V512 comment above */ Please align the 'V512' -- just to make it pretty ----- + /* MASKING */ + Iop_Mask32to512, Iop_Mask64to512, + Iop_Mask32to256, Iop_Mask64to256, Add a 1-line comment explaining roughly what these do, eg like the INTERLEAVING case just below. ----- + /* Detect duplicate values. Compare each element of the source vector + * for equality with all other elements closer to the least significant one, + * combine the comparison results to a bit vector */ + Iop_CfD32x16, Can you give this a better name? The comment makes sense, but I couldn't guess that from 'CfD'. What does CfD stand for? ----- + // V512 -> I64, must be sequential You could (partly) enforce this at compile time with STATIC_ASSERT(Iop_V512to64_7 == Iop_V512to64_0 + 7) ----- + Iop_Align32x16, Iop_Align64x8, + Iop_Expand32x16, Iop_Expand64x8, + Iop_Compress32x16, Iop_Compress64x8, + Iop_Ternlog32x16, Iop_Ternlog64x8, + Please add a short comment explaining approximately what these operations do. ----- -- You are receiving this mail because: You are watching all bug changes.