https://bugs.kde.org/show_bug.cgi?id=429354
--- Comment #2 from Julian Seward <jsew...@acm.org> --- ================== https://bugs.kde.org/attachment.cgi?id=133462 Test cases for VSX Mask Manipulation operations OK to land ================== https://bugs.kde.org/attachment.cgi?id=133461 Functional support for ISA 3.1, VSX Mask manipulation operations I have some concerns about the verboseness/inefficency of the generated IR. +static IRExpr * copy_MSB_bit_fields ( IRExpr *src, UInt size ) +{ + /* Input src is a V128 value. Input size is the number of bits in each + * vector field. The function copies the MSB of each vector field into + * the low order bits of the 64-bit result. + */ You can do it like this, but it looks very expensive. Note that there is an IROp which (if I understand this correctly) does what you want: /* MISC CONVERSION -- get high bits of each byte lane, a la x86/amd64 pmovmskb */ Iop_GetMSBs8x16, /* V128 -> I16 */ You could use that instead (and I'd encourage you to do so), although it does mean you'd have to handle this in the POWER backend, obviously. Although now I think about it, I am not sure what the deal is with endianness -- Iop_GetMSBs8x16 has been used so far only on little-endian targets. If Power has an instruction that can add all the lanes of a SIMD value together, creating a single number, then there's a faster way to do this, that doesn't involve Iop_GetMSBs8x16. Something like this: uint16_t getMSBs_8x16(vec128) { let hiHalf = vec128[127:64] // LE numbering let loHalf = vec128[ 63:0] // In each byte lane, copy the MSB to all bit positions hiHalf = shift_right_signed_8x8(hiHalf, 7); loHalf = shift_right_signed_8x8(loHalf, 7); // Now each byte lane is either 0x00 or 0xFF // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain // either 0x00 or 0x40, etc hiHalf &= 0x8040201008040201; loHalf &= 0x8040201008040201; hi8msbs = add_across_lanes_8x8(hiHalf) lo8msbs = add_across_lanes_8x8(loHalf) return (hi8msbs << 8) | lo8msbs; } There are variants, but you get the general idea from the above. + if (IFIELD(theInstr, 1, 5) == 0xA) //mtvsrbmi + inst_select = 0x9999; Add a comment to explain that this is a special-case hack for mtvsrbmi. + default: + vex_printf("dis_VSR_byte_mask (opc2)\n"); + return False; + } Don't print on failure paths; only return False. + for(i = 0; i< max; i++) { + ones_lo[i+1] = newTemp( Ity_I64 ); + test_lo[i] = newTemp( Ity_I64 ); + ones_hi[i+1] = newTemp( Ity_I64 ); + test_hi[i] = newTemp( Ity_I64 ); .. This seems really inefficient; is there no way to do this somewhat in parallel using SIMD IROps ? ================== -- You are receiving this mail because: You are watching all bug changes.