On Mon, Jan 8, 2018 at 5:01 PM, Scott D Phillips
<scott.d.phill...@intel.com> wrote:
> Matt Turner <matts...@gmail.com> writes:
>
>> Some cases weren't handled, such as stride 4 which is needed for 64-bit
>> operations. Presumably fixes the assertion failure mentioned in commit
>> 2d0457203871 (Revert "i965/fs: Use align1 mode on ternary instructions
>> on Gen10+") but who can really say since the commit neglected to list
>> any of them!
>> ---
>>  src/intel/compiler/brw_eu_emit.c | 69 
>> ++++++++++++++++++++++++----------------
>>  1 file changed, 41 insertions(+), 28 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_eu_emit.c 
>> b/src/intel/compiler/brw_eu_emit.c
>> index 85bb6a4cdd..c25d8d6eda 100644
>> --- a/src/intel/compiler/brw_eu_emit.c
>> +++ b/src/intel/compiler/brw_eu_emit.c
>> @@ -673,6 +673,42 @@ get_3src_subreg_nr(struct brw_reg reg)
>>     return reg.subnr / 4;
>>  }
>>
>> +static enum gen10_align1_3src_vertical_stride
>> +to_3src_align1_vstride(enum brw_vertical_stride vstride)
>> +{
>> +   switch (vstride) {
>> +   case BRW_VERTICAL_STRIDE_0:
>> +      return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_0;
>> +   case BRW_VERTICAL_STRIDE_2:
>> +      return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_2;
>> +   case BRW_VERTICAL_STRIDE_4:
>> +      return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_4;
>> +   case BRW_VERTICAL_STRIDE_8:
>> +   case BRW_VERTICAL_STRIDE_16:
>> +      return BRW_ALIGN1_3SRC_VERTICAL_STRIDE_8;
>
> What is the reasoning for vstride 16 to map to 8 here? Could that cause
> problems?

Good question. In an exec_size 16 instruction, a 16,16,1 region
actually reads the same channels in the same order as an 8,8,1 region
(another confusing thing about regioning is that there are effectively
duplicates...). That's the most common region, and I seem to recall
that in some cases the IR contains instructions with a 16,16,1 region.
Other than in that duplicate case, I don't think we ever use vstride
16. As a result, they left it out of the hardware for align1 ternary
instructions.

If I can get some hardware to test with, it's probably a good idea for
me to to double check which cases cause us to need to handle vstride
16 here. Maybe an assertion that it is the "duplicate" 16,16,1 region
should be added.

>> +   default:
>> +      unreachable("invalid vstride");
>> +   }
>> +}
>> +
>> +
>> +static enum gen10_align1_3src_src_horizontal_stride
>> +to_3src_align1_hstride(enum brw_horizontal_stride hstride)
>> +{
>> +   switch (hstride) {
>> +   case BRW_HORIZONTAL_STRIDE_0:
>> +      return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_0;
>> +   case BRW_HORIZONTAL_STRIDE_1:
>> +      return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_1;
>> +   case BRW_HORIZONTAL_STRIDE_2:
>> +      return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_2;
>> +   case BRW_HORIZONTAL_STRIDE_4:
>> +      return BRW_ALIGN1_3SRC_SRC_HORIZONTAL_STRIDE_4;
>> +   default:
>> +      unreachable("invalid hstride");
>> +   }
>> +}
>> +
>>  static brw_inst *
>>  brw_alu3(struct brw_codegen *p, unsigned opcode, struct brw_reg dest,
>>           struct brw_reg src0, struct brw_reg src1, struct brw_reg src2)
>> @@ -721,41 +757,18 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, 
>> struct brw_reg dest,
>>        brw_inst_set_3src_a1_src1_type(devinfo, inst, src1.type);
>>        brw_inst_set_3src_a1_src2_type(devinfo, inst, src2.type);
>>
>> -      assert((src0.vstride == BRW_VERTICAL_STRIDE_0 &&
>> -              src0.hstride == BRW_HORIZONTAL_STRIDE_0) ||
>> -             (src0.vstride == BRW_VERTICAL_STRIDE_8 &&
>> -              src0.hstride == BRW_HORIZONTAL_STRIDE_1));
>> -      assert((src1.vstride == BRW_VERTICAL_STRIDE_0 &&
>> -              src1.hstride == BRW_HORIZONTAL_STRIDE_0) ||
>> -             (src1.vstride == BRW_VERTICAL_STRIDE_8 &&
>> -              src1.hstride == BRW_HORIZONTAL_STRIDE_1));
>> -      assert((src2.vstride == BRW_VERTICAL_STRIDE_0 &&
>> -              src2.hstride == BRW_HORIZONTAL_STRIDE_0) ||
>> -             (src2.vstride == BRW_VERTICAL_STRIDE_8 &&
>> -              src2.hstride == BRW_HORIZONTAL_STRIDE_1));
>> -
>
> Were 0,x,0 and 8,x,1 just a list of expected cases before or was it
> toward some restriction? I'm not seeing anything in the documentation
> that implies a restriction, so I'm guessing the former.

No restriction; these were just the most common (I thought only at the
time) cases. 8,8,1 is the regular "read every channel in order"
region, and 0,1,0 is the "read a single channel repeatedly" region.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to