https://bugs.kde.org/show_bug.cgi?id=429354

--- Comment #2 from Julian Seward <jsew...@acm.org> ---
==================

https://bugs.kde.org/attachment.cgi?id=133462
Test cases for VSX Mask Manipulation operations

OK to land

==================

https://bugs.kde.org/attachment.cgi?id=133461
Functional support for ISA 3.1, VSX Mask manipulation operations

I have some concerns about the verboseness/inefficency of the generated IR.

+static IRExpr * copy_MSB_bit_fields (  IRExpr *src, UInt size )
+{
+   /* Input src is a V128 value.  Input size is the number of bits in each
+    * vector field.  The function copies the MSB of each vector field into
+    * the low order bits of the 64-bit result.
+    */

You can do it like this, but it looks very expensive.  Note that there is
an IROp which (if I understand this correctly) does what you want:

      /* MISC CONVERSION -- get high bits of each byte lane, a la
         x86/amd64 pmovmskb */
      Iop_GetMSBs8x16, /* V128 -> I16 */

You could use that instead (and I'd encourage you to do so), although
it does mean you'd have to handle this in the POWER backend, obviously.

Although now I think about it, I am not sure what the deal is with
endianness -- Iop_GetMSBs8x16 has been used so far only on little-endian
targets.

If Power has an instruction that can add all the lanes of a SIMD value
together, creating a single number, then there's a faster way to do this, that
doesn't involve Iop_GetMSBs8x16.  Something like this:

   uint16_t getMSBs_8x16(vec128)
   {
      let hiHalf = vec128[127:64]  // LE numbering
      let loHalf = vec128[ 63:0]
      // In each byte lane, copy the MSB to all bit positions
      hiHalf = shift_right_signed_8x8(hiHalf, 7);
      loHalf = shift_right_signed_8x8(loHalf, 7);
      // Now each byte lane is either 0x00 or 0xFF
      // Make (eg) lane 7 contain either 0x00 or 0x80, lane 6 contain
      // either 0x00 or 0x40, etc
      hiHalf &= 0x8040201008040201;
      loHalf &= 0x8040201008040201;
      hi8msbs = add_across_lanes_8x8(hiHalf)
      lo8msbs = add_across_lanes_8x8(loHalf)
      return (hi8msbs << 8) | lo8msbs;
   }

There are variants, but you get the general idea from the above.

+   if (IFIELD(theInstr, 1, 5) == 0xA)    //mtvsrbmi
+      inst_select = 0x9999;

Add a comment to explain that this is a special-case hack for mtvsrbmi.

+   default:
+      vex_printf("dis_VSR_byte_mask (opc2)\n");
+      return False;
+   }

Don't print on failure paths; only return False.


+   for(i = 0; i< max; i++) {
+      ones_lo[i+1] = newTemp( Ity_I64 );
+      test_lo[i] = newTemp( Ity_I64 );
+      ones_hi[i+1] = newTemp( Ity_I64 );
+      test_hi[i] = newTemp( Ity_I64 );
..

This seems really inefficient; is there no way to do this somewhat
in parallel using SIMD IROps ?

==================

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to