Re: [Mesa-dev] [PATCH] R600: Relax some vector constraints on Dot4.

Christian König Fri, 15 Mar 2013 08:10:36 -0700

Ok that makes more sense, thx for the explanation.

I was just wondering why all this stuff is needed, and as I said, I'mnot so deeply into the R600 part of the backend.

And I strongly agree that we somehow need to better teach the backendabout our vector slots and all those limitations (Sometimes I'm reallyhappy to work on SI instead).


Christian.

Am 15.03.2013 15:52, schrieb Vincent Lejeune:

Hi Christian,

LLVM does indeed coalesce registers for R600 targets, I was however thinking of 
copies between vectors.
For instance, let's say you have 4 vectors coming from instructions that only 
emit vectors (like TEX_SAMPLE iirc) :

If the shader wants to "mix" them before doing dp4, you end with something like 
:

T0_XYZW = TEX_SAMPLE
T1_XYZW = TEX_SAMPLE
T2_XYZW = TEX_SAMPLE
T3_XYZW = TEX_SAMPLE
T0_W = COPY T4_W
T1_Z =  COPY T3_Z
DOT4 T0_XYZW, T1_XYZW

 From hw point of view, the 2 copies are not necessary because DOT4 
instructions does not require that its operands belong to the same 128 bits 
register.
It's perfectly legal to have a bundle like this one :

Dot4_eg_real T0_X T1_X
Dot4_eg_real T0_Y T1_Y
Dot4_eg_real T0_Z T3_Z
Dot4_eg_real T4_W T1_W

(In fact it is even possible to remove the R600_TReg32_* constraints on the 
inputs but then you have to ensure the bundle does not read more than
3 gprs from a channel which need much more work)

The previous case may seem not so frequent but it still occurs in Lightmark and 
Unigine Heaven.

We represent dot4 inputs as vectors but using 8 scalar inputs is closer from hw 
capabilities, that's why I wrote this patch. Besides, scalar values usually
have shorter live interval, lowering register pressure. Shaders that have a dp4 
instructions often end up consuming less registers with this patch.

Vincent




----- Mail original -----

De : Christian König <deathsim...@vodafone.de>
À : Vincent Lejeune <v...@ovi.com>
Cc : llvm-comm...@cs.uiuc.edu; mesa-dev@lists.freedesktop.org
Envoyé le : Vendredi 15 mars 2013 11h18
Objet : Re: [Mesa-dev] [PATCH] R600: Relax some vector constraints on Dot4.

Hi Vincent,

while I really appreciate your work, I think you're development is going
into the wrong direction here. Those copies you're trying to avoid (not only
with this patch, but also with the previous REG_SEQUENCE patches), shouldn't
happen in the first place. I'm not so deeply into the R600 part of our LLVM
backend that I can say that I'm 100% sure, but to me that just looks like
workarounds to an incorrect defined register space.

Here is an simple example from SI, that should show how things are intended to
work. It's a simple 2D texture fetch, the coordinates of that this fetch are
usually provided in an two element vector build of VGPRs (I use a 2D fetch just
for simplicity, a 3D fetch with explicit LOD would work the same way and would
use a four element vector).

After ISel the assembler code starts with something like this (simplified):
...
%vreg13<def,tied1> = V_INTERP_P2_F32 ...
...
%vreg17<def,tied1> = V_INTERP_P2_F32 ...
...
%vreg22<def> = IMPLICIT_DEF; VReg_64:%vreg22
%vreg21<def,tied1> = INSERT_SUBREG %vreg22<tied0>,
%vreg13<kill>, sub0; VReg_64:%vreg21,%vreg22 VReg_32:%vreg13
%vreg23<def,tied1> = INSERT_SUBREG %vreg21<tied0>,
%vreg17<kill>, sub1; VReg_64:%vreg23,%vreg21 VReg_32:%vreg17
%vreg24<def> = IMAGE_SAMPLE 15, 0, 0, 0, 0, 0, 0, 0, %vreg23<kill>,
....

As you can see the sub components of the vectors are inserted/extracted just
like it happens on R600, but the registerallocater is capable of handling that
much better than on R600 and so avoiding the (sometimes quite expensive) COPY
operations in the first place. The resulting code looks like this:

...
%vreg23:sub0<def,tied1> = V_INTERP_P2_F32 ...
...
%vreg23:sub1<def,tied1> = V_INTERP_P2_F32 ...
....
%vreg24<def> = IMAGE_SAMPLE 15, 0, 0, 0, 0, 0, 0, 0, %vreg23, ...

So INSERT_SUBREG isn't replaced with a COPY like on R600, but instead the
V_INTERP_P2_F32 instructions can write directly to the appropriate sub register
component.

I'm not 100% sure why this doesn't work the same way on R600, but I
think it might be a good idea figuring that out.

Cheers,
Christian.

Am 14.03.2013 21:51, schrieb Vincent Lejeune:

  Dot4 now uses 8 scalar operands instead of 2 vectors one which allows

register

  coalescer to remove some unneeded COPY.
  This patch also defines some structures/functions that can be used to

handle

  every vector instructions (CUBE, Cayman special instructions...) in a

similar

  fashion.
  ---
    lib/Target/R600/AMDGPUISelLowering.h        |  1 +
    lib/Target/R600/R600Defines.h               | 74 ++++++++++++++++++++++++
    lib/Target/R600/R600ExpandSpecialInstrs.cpp | 25 ++++++++
    lib/Target/R600/R600ISelLowering.cpp        | 21 +++++++
    lib/Target/R600/R600InstrInfo.cpp           | 88

+++++++++++++++++++++++++++++

    lib/Target/R600/R600InstrInfo.h             |  5 ++
    lib/Target/R600/R600Instructions.td         | 51 ++++++++++++++++-
    lib/Target/R600/R600MachineScheduler.cpp    |  2 +
    8 files changed, 266 insertions(+), 1 deletion(-)

  diff --git a/lib/Target/R600/AMDGPUISelLowering.h

b/lib/Target/R600/AMDGPUISelLowering.h

  index f31b646..f9f5a60 100644
  --- a/lib/Target/R600/AMDGPUISelLowering.h
  +++ b/lib/Target/R600/AMDGPUISelLowering.h
  @@ -125,6 +125,7 @@ enum {
      SMIN,
      UMIN,
      URECIP,
  +  DOT4,
      EXPORT,
      CONST_ADDRESS,
      REGISTER_LOAD,
  diff --git a/lib/Target/R600/R600Defines.h b/lib/Target/R600/R600Defines.h
  index 16cfcf5..72d83b0 100644
  --- a/lib/Target/R600/R600Defines.h
  +++ b/lib/Target/R600/R600Defines.h
  @@ -92,6 +92,80 @@ namespace R600Operands {
        {0,-1,-1,-1,-1, 1, 2, 3, 4, 5,-1, 6, 7, 8,

9,-1,10,11,12,13,14,15,16,17}

      };
    +  enum VecOps {
  +    UPDATE_EXEC_MASK_X,
  +    UPDATE_PREDICATE_X,
  +    WRITE_X,
  +    OMOD_X,
  +    DST_REL_X,
  +    CLAMP_X,
  +    SRC0_X,
  +    SRC0_NEG_X,
  +    SRC0_REL_X,
  +    SRC0_ABS_X,
  +    SRC0_SEL_X,
  +    SRC1_X,
  +    SRC1_NEG_X,
  +    SRC1_REL_X,
  +    SRC1_ABS_X,
  +    SRC1_SEL_X,
  +    PRED_SEL_X,
  +    UPDATE_EXEC_MASK_Y,
  +    UPDATE_PREDICATE_Y,
  +    WRITE_Y,
  +    OMOD_Y,
  +    DST_REL_Y,
  +    CLAMP_Y,
  +    SRC0_Y,
  +    SRC0_NEG_Y,
  +    SRC0_REL_Y,
  +    SRC0_ABS_Y,
  +    SRC0_SEL_Y,
  +    SRC1_Y,
  +    SRC1_NEG_Y,
  +    SRC1_REL_Y,
  +    SRC1_ABS_Y,
  +    SRC1_SEL_Y,
  +    PRED_SEL_Y,
  +    UPDATE_EXEC_MASK_Z,
  +    UPDATE_PREDICATE_Z,
  +    WRITE_Z,
  +    OMOD_Z,
  +    DST_REL_Z,
  +    CLAMP_Z,
  +    SRC0_Z,
  +    SRC0_NEG_Z,
  +    SRC0_REL_Z,
  +    SRC0_ABS_Z,
  +    SRC0_SEL_Z,
  +    SRC1_Z,
  +    SRC1_NEG_Z,
  +    SRC1_REL_Z,
  +    SRC1_ABS_Z,
  +    SRC1_SEL_Z,
  +    PRED_SEL_Z,
  +    UPDATE_EXEC_MASK_W,
  +    UPDATE_PREDICATE_W,
  +    WRITE_W,
  +    OMOD_W,
  +    DST_REL_W,
  +    CLAMP_W,
  +    SRC0_W,
  +    SRC0_NEG_W,
  +    SRC0_REL_W,
  +    SRC0_ABS_W,
  +    SRC0_SEL_W,
  +    SRC1_W,
  +    SRC1_NEG_W,
  +    SRC1_REL_W,
  +    SRC1_ABS_W,
  +    SRC1_SEL_W,
  +    PRED_SEL_W,
  +    IMM_0,
  +    IMM_1,
  +    VEC_COUNT
  + };
  +
    }
      #endif // R600DEFINES_H_
  diff --git a/lib/Target/R600/R600ExpandSpecialInstrs.cpp

b/lib/Target/R600/R600ExpandSpecialInstrs.cpp

  index f8c900f..993bdad 100644
  --- a/lib/Target/R600/R600ExpandSpecialInstrs.cpp
  +++ b/lib/Target/R600/R600ExpandSpecialInstrs.cpp
  @@ -182,6 +182,31 @@ bool

R600ExpandSpecialInstrsPass::runOnMachineFunction(MachineFunction &MF) {

            MI.eraseFromParent();
            continue;
            }
  +      case AMDGPU::DOT_4: {
  +
  +        const R600RegisterInfo &TRI = TII->getRegisterInfo();
  +
  +        unsigned DstReg = MI.getOperand(0).getReg();
  +        unsigned DstBase = TRI.getEncodingValue(DstReg) & HW_REG_MASK;
  +
  +        for (unsigned Chan = 0; Chan < 4; ++Chan) {
  +          bool Mask = (Chan != TRI.getHWRegChan(DstReg));
  +          unsigned SubDstReg =
  +              AMDGPU::R600_TReg32RegClass.getRegister((DstBase * 4) +

Chan);

  +          MachineInstr *BMI =
  +              TII->buildSlotOfVectorInstruction(MBB, &MI, Chan,

SubDstReg);

  +          if (Chan > 0) {
  +            BMI->bundleWithPred();
  +          }
  +          if (Mask) {
  +            TII->addFlag(BMI, 0, MO_FLAG_MASK);
  +          }
  +          if (Chan != 3)
  +            TII->addFlag(BMI, 0, MO_FLAG_NOT_LAST);
  +        }
  +        MI.eraseFromParent();
  +        continue;
  +      }
          }
            bool IsReduction = TII->isReductionOp(MI.getOpcode());
  diff --git a/lib/Target/R600/R600ISelLowering.cpp

b/lib/Target/R600/R600ISelLowering.cpp

  index a73691d..4868dc7 100644
  --- a/lib/Target/R600/R600ISelLowering.cpp
  +++ b/lib/Target/R600/R600ISelLowering.cpp
  @@ -394,6 +394,27 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op,

SelectionDAG &DAG) const

            return SDValue(interp, slot % 2);
        }
  +    case AMDGPUIntrinsic::AMDGPU_dp4: {
  +      SDValue Args[8] = {
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),
  +          DAG.getConstant(0, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),
  +          DAG.getConstant(0, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),
  +          DAG.getConstant(1, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),
  +          DAG.getConstant(1, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),
  +          DAG.getConstant(2, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),
  +          DAG.getConstant(2, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(1),
  +          DAG.getConstant(3, MVT::i32)),
  +      DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::f32, Op.getOperand(2),
  +          DAG.getConstant(3, MVT::i32))
  +      };
  +      return DAG.getNode(AMDGPUISD::DOT4, DL, MVT::f32, Args, 8);
  +    }
          case r600_read_ngroups_x:
          return LowerImplicitParameter(DAG, VT, DL, 0);
  diff --git a/lib/Target/R600/R600InstrInfo.cpp

b/lib/Target/R600/R600InstrInfo.cpp

  index 0865098..f686c5c 100644
  --- a/lib/Target/R600/R600InstrInfo.cpp
  +++ b/lib/Target/R600/R600InstrInfo.cpp
  @@ -686,6 +686,94 @@ MachineInstrBuilder

R600InstrInfo::buildDefaultInstruction(MachineBasicBlock &MB

      return MIB;
    }
    +#define OPERAND_CASE(Label) \
  +  case Label: { \
  +    static const R600Operands::VecOps Ops[] = \
  +    { \
  +      Label##_X, \
  +      Label##_Y, \
  +      Label##_Z, \
  +      Label##_W \
  +    }; \
  +    return Ops[Slot]; \
  +  }
  +
  +static R600Operands::VecOps
  +getSlotedOps(R600Operands::Ops Op, unsigned Slot) {
  +  switch (Op) {
  +  OPERAND_CASE(R600Operands::UPDATE_EXEC_MASK)
  +  OPERAND_CASE(R600Operands::UPDATE_PREDICATE)
  +  OPERAND_CASE(R600Operands::WRITE)
  +  OPERAND_CASE(R600Operands::OMOD)
  +  OPERAND_CASE(R600Operands::DST_REL)
  +  OPERAND_CASE(R600Operands::CLAMP)
  +  OPERAND_CASE(R600Operands::SRC0)
  +  OPERAND_CASE(R600Operands::SRC0_NEG)
  +  OPERAND_CASE(R600Operands::SRC0_REL)
  +  OPERAND_CASE(R600Operands::SRC0_ABS)
  +  OPERAND_CASE(R600Operands::SRC0_SEL)
  +  OPERAND_CASE(R600Operands::SRC1)
  +  OPERAND_CASE(R600Operands::SRC1_NEG)
  +  OPERAND_CASE(R600Operands::SRC1_REL)
  +  OPERAND_CASE(R600Operands::SRC1_ABS)
  +  OPERAND_CASE(R600Operands::SRC1_SEL)
  +  OPERAND_CASE(R600Operands::PRED_SEL)
  +  default:
  +    llvm_unreachable("Wrong Operand");
  +  }
  +}
  +
  +#undef OPERAND_CASE
  +
  +static int
  +getVecOperandIdx(R600Operands::VecOps Op) {
  +  return 1 + Op;
  +}
  +
  +
  +MachineInstr *R600InstrInfo::buildSlotOfVectorInstruction(
  +    MachineBasicBlock &MBB, MachineInstr *MI, unsigned Slot, unsigned

DstReg)

  +    const {
  +  assert (MI->getOpcode() == AMDGPU::DOT_4 && "Not

Implemented");

  +  unsigned Opcode;
  +  const AMDGPUSubtarget &ST =

TM.getSubtarget<AMDGPUSubtarget>();

  +  if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD4XXX)
  +    Opcode = AMDGPU::DOT4_r600_real;
  +  else
  +    Opcode = AMDGPU::DOT4_eg_real;
  +  MachineBasicBlock::iterator I = MI;
  +  MachineOperand &Src0 = MI->getOperand(
  +      getVecOperandIdx(getSlotedOps(R600Operands::SRC0, Slot)));
  +  MachineOperand &Src1 = MI->getOperand(
  +      getVecOperandIdx(getSlotedOps(R600Operands::SRC1, Slot)));
  +  MachineInstr *MIB = buildDefaultInstruction(
  +      MBB, I, Opcode, DstReg, Src0.getReg(), Src1.getReg());
  +  static const R600Operands::Ops Operands[14] = {
  +    R600Operands::UPDATE_EXEC_MASK,
  +    R600Operands::UPDATE_PREDICATE,
  +    R600Operands::WRITE,
  +    R600Operands::OMOD,
  +    R600Operands::DST_REL,
  +    R600Operands::CLAMP,
  +    R600Operands::SRC0_NEG,
  +    R600Operands::SRC0_REL,
  +    R600Operands::SRC0_ABS,
  +    R600Operands::SRC0_SEL,
  +    R600Operands::SRC1_NEG,
  +    R600Operands::SRC1_REL,
  +    R600Operands::SRC1_ABS,
  +    R600Operands::SRC1_SEL,
  +  };
  +
  +  for (unsigned i = 0; i < 14; i++) {
  +    MachineOperand &MO = MI->getOperand(
  +        getVecOperandIdx(getSlotedOps(Operands[i], Slot)));
  +    assert (MO.isImm());
  +    setImmOperand(MIB, Operands[i], MO.getImm());
  +  }
  +  return MIB;
  +}
  +
    MachineInstr *R600InstrInfo::buildMovImm(MachineBasicBlock &BB,
                                             MachineBasicBlock::iterator I,
                                             unsigned DstReg,
  diff --git a/lib/Target/R600/R600InstrInfo.h

b/lib/Target/R600/R600InstrInfo.h

  index bf9569e..e38ed00 100644
  --- a/lib/Target/R600/R600InstrInfo.h
  +++ b/lib/Target/R600/R600InstrInfo.h
  @@ -160,6 +160,11 @@ namespace llvm {
                                                  unsigned Src0Reg,
                                                  unsigned Src1Reg = 0)

const;

    +  MachineInstr *buildSlotOfVectorInstruction(MachineBasicBlock &MBB,
  +                                             MachineInstr *MI,
  +                                             unsigned Slot,
  +                                             unsigned DstReg) const;
  +
      MachineInstr *buildMovImm(MachineBasicBlock &BB,
                                      MachineBasicBlock::iterator I,
                                      unsigned DstReg,
  diff --git a/lib/Target/R600/R600Instructions.td

b/lib/Target/R600/R600Instructions.td

  index c5fa334..95a99a6 100644
  --- a/lib/Target/R600/R600Instructions.td
  +++ b/lib/Target/R600/R600Instructions.td
  @@ -516,6 +516,13 @@ def CONST_ADDRESS:

SDNode<"AMDGPUISD::CONST_ADDRESS",

      [SDNPVariadic]
    >;
    +def DOT4 : SDNode<"AMDGPUISD::DOT4",
  +  SDTypeProfile<1, 8, [SDTCisFP<0>, SDTCisVT<1, f32>,

SDTCisVT<2, f32>,

  +      SDTCisVT<3, f32>, SDTCisVT<4, f32>, SDTCisVT<5,

f32>,

  +      SDTCisVT<6, f32>, SDTCisVT<7, f32>, SDTCisVT<8,

f32>]>,

  +  []
  +>;
  +

//===----------------------------------------------------------------------===//

    // Interpolation Instructions

//===----------------------------------------------------------------------===//

  @@ -982,12 +989,54 @@ class CNDGE_Common <bits<5> inst> :

R600_3OP <

           COND_GE))]
    >;
    +
  +let isCodeGenOnly = 1, isPseudo = 1, Namespace = "AMDGPU"  in {
  +class R600_VEC2OP<list<dag> pattern> : InstR600 <0, (outs

R600_Reg32:$dst), (ins

  +// Slot X
  +   UEM:$update_exec_mask_X, UP:$update_pred_X, WRITE:$write_X,
  +   OMOD:$omod_X, REL:$dst_rel_X, CLAMP:$clamp_X,
  +   R600_TReg32_X:$src0_X, NEG:$src0_neg_X, REL:$src0_rel_X,

ABS:$src0_abs_X, SEL:$src0_sel_X,

  +   R600_TReg32_X:$src1_X, NEG:$src1_neg_X, REL:$src1_rel_X,

ABS:$src1_abs_X, SEL:$src1_sel_X,

  +   R600_Pred:$pred_sel_X,
  +// Slot Y
  +   UEM:$update_exec_mask_Y, UP:$update_pred_Y, WRITE:$write_Y,
  +   OMOD:$omod_Y, REL:$dst_rel_Y, CLAMP:$clamp_Y,
  +   R600_TReg32_Y:$src0_Y, NEG:$src0_neg_Y, REL:$src0_rel_Y,

ABS:$src0_abs_Y, SEL:$src0_sel_Y,

  +   R600_TReg32_Y:$src1_Y, NEG:$src1_neg_Y, REL:$src1_rel_Y,

ABS:$src1_abs_Y, SEL:$src1_sel_Y,

  +   R600_Pred:$pred_sel_Y,
  +// Slot Z
  +   UEM:$update_exec_mask_Z, UP:$update_pred_Z, WRITE:$write_Z,
  +   OMOD:$omod_Z, REL:$dst_rel_Z, CLAMP:$clamp_Z,
  +   R600_TReg32_Z:$src0_Z, NEG:$src0_neg_Z, REL:$src0_rel_Z,

ABS:$src0_abs_Z, SEL:$src0_sel_Z,

  +   R600_TReg32_Z:$src1_Z, NEG:$src1_neg_Z, REL:$src1_rel_Z,

ABS:$src1_abs_Z, SEL:$src1_sel_Z,

  +   R600_Pred:$pred_sel_Z,
  +// Slot W
  +   UEM:$update_exec_mask_W, UP:$update_pred_W, WRITE:$write_W,
  +   OMOD:$omod_W, REL:$dst_rel_W, CLAMP:$clamp_W,
  +   R600_TReg32_W:$src0_W, NEG:$src0_neg_W, REL:$src0_rel_W,

ABS:$src0_abs_W, SEL:$src0_sel_W,

  +   R600_TReg32_W:$src1_W, NEG:$src1_neg_W, REL:$src1_rel_W,

ABS:$src1_abs_W, SEL:$src1_sel_W,

  +   R600_Pred:$pred_sel_W,
  +   LITERAL:$literal0, LITERAL:$literal1),
  +  "",
  +  pattern,
  +  AnyALU> {}
  +}
  +
  +def DOT_4 : R600_VEC2OP<[(set R600_Reg32:$dst, (DOT4
  +  R600_TReg32_X:$src0_X, R600_TReg32_X:$src1_X,
  +  R600_TReg32_Y:$src0_Y, R600_TReg32_Y:$src1_Y,
  +  R600_TReg32_Z:$src0_Z, R600_TReg32_Z:$src1_Z,
  +  R600_TReg32_W:$src0_W, R600_TReg32_W:$src1_W))]>;
  +
  +
  +
  +
    multiclass DOT4_Common <bits<11> inst> {
        def _pseudo : R600_REDUCTION <inst,
        (ins R600_Reg128:$src0, R600_Reg128:$src1),
        "DOT4 $dst $src0, $src1",
  -    [(set R600_Reg32:$dst, (int_AMDGPU_dp4 R600_Reg128:$src0,

R600_Reg128:$src1))]

  +    []
      >;
        def _real : R600_2OP <inst, "DOT4", []>;
  diff --git a/lib/Target/R600/R600MachineScheduler.cpp

b/lib/Target/R600/R600MachineScheduler.cpp

  index e515d3e..5a18de9 100644
  --- a/lib/Target/R600/R600MachineScheduler.cpp
  +++ b/lib/Target/R600/R600MachineScheduler.cpp
  @@ -407,6 +407,7 @@ R600SchedStrategy::AluKind

R600SchedStrategy::getAluKind(SUnit *SU) const {

        case AMDGPU::INTERP_PAIR_XY:
        case AMDGPU::INTERP_PAIR_ZW:
        case AMDGPU::INTERP_VEC_LOAD:
  +    case AMDGPU::DOT_4:
          return AluT_XYZW;
        case AMDGPU::COPY:
          if (MI->getOperand(1).isUndef()) {
  @@ -471,6 +472,7 @@ int R600SchedStrategy::getInstKind(SUnit* SU) {
      case AMDGPU::INTERP_VEC_LOAD:
      case AMDGPU::DOT4_eg_pseudo:
      case AMDGPU::DOT4_r600_pseudo:
  +  case AMDGPU::DOT_4:
        return IDAlu;
      case AMDGPU::TEX_VTX_CONSTBUF:
      case AMDGPU::TEX_VTX_TEXBUF:


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] R600: Relax some vector constraints on Dot4.

Reply via email to