Iago Toral <ito...@igalia.com> writes:

> On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote:
>> Iago Toral <ito...@igalia.com> writes:
>> 
>> > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> > > Iago Toral <ito...@igalia.com> writes:
>> > > 
>> > > > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> > > > > Iago Toral Quiroga <ito...@igalia.com> writes:
>> > > > > 
>> > > > > > Commit c84ec70b3a72 implemented execution type promotion to
>> > > > > > 32-
>> > > > > > bit
>> > > > > > for
>> > > > > > conversions involving half-float registers, which empirical
>> > > > > > testing
>> > > > > > suggested
>> > > > > > was required, but it did not incorporate this change into
>> > > > > > the
>> > > > > > assembly validator
>> > > > > > logic. This commits adds that, preventing validation errors
>> > > > > > like
>> > > > > > this:
>> > > > > > 
>> > > > > 
>> > > > > I don't think we should be validating empirical assumptions
>> > > > > in
>> > > > > the EU
>> > > > > validator.
>> > > > 
>> > > > I am not sure I get your point, isn't c84ec70b3a72 also based
>> > > > on
>> > > > empirical testing after all?
>> > > > 
>> > > 
>> > > To some extent, but it doesn't attempt to enforce ISA
>> > > restrictions
>> > > based
>> > > on information obtained empirically.
>> > > 
>> > > > 
>> > > > > > mov(16)  g9<4>B   g3<16,8,2>HF     { align1 1H };
>> > > > > > ERROR: Destination stride must be equal to the ratio of the
>> > > > > > sizes
>> > > > > > of the
>> > > > > >        execution data type to the destination type
>> > > > > > 
>> > > > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type to
>> > > > > > 32-bit
>> > > > > > when any half-float conversion is needed."
>> > > > > 
>> > > > > I don't think this "fixes" anything that ever worked.
>> > > > 
>> > > > It is true that the code in that trace above is not something
>> > > > we
>> > > > can
>> > > > produce right now, because it is a conversion from HF to B and
>> > > > that
>> > > > should only happen within the context of
>> > > > VK_KHR_shader_float16_int8,
>> > > > however, this is a consequence of the fact that since
>> > > > c84ec70b3a72
>> > > > there is an inconsistency between what we do at the IR level
>> > > > regarding
>> > > > execution size of HF conversions and what the EU validator is
>> > > > doing,
>> > > > and from that perspective this is really fixing an
>> > > > inconsistency
>> > > > that
>> > > > didn't exist before, and I thought we would want to address
>> > > > that
>> > > > sooner
>> > > > rather than later and track it down to the original change that
>> > > > introduced that inconsistency so we know where this is coming
>> > > > from.
>> > > > 
>> > > 
>> > > The "inconsistency" between the IR's get_exec_type() and the EU
>> > > validator's execution_type() has existed ever since
>> > > a05b6f25bf4bfad7
>> > > removed the HF assert from get_exec_type() without actually
>> > > implementing
>> > > the code required to handle HF operands (which is what my commit
>> > > c84ec70b3a72 did).
>> > 
>> > I agree with the fact that since a05b6f25bf4bfad7 the validator
>> > could
>> > reject valid code and that had nothing to do with your patch,
>> 
>> The validator rejected the same valid HF code since it was written,
>> that
>> had nothing to do with neither a05b6f25bf4bfad7 nor with my patch,
>> and
>> it is the real problem this patch was working around.
>> 
>> > but the inconsistency I am talking about here, that this patch
>> > fixes,
>> > is the one about get_exec_type() in the IR and execution_type() in
>> > the
>> > validator doing different things for HF instructions, which only
>> > exists since your patch and which you discuss below.
>> > 
>> 
>> The "inconsistency" exists ever since get_exec_type() was introduced
>> without correct handling of HF types (even though execution_type()
>> already attempted to handle it).  And I disagree that it's a real
>> inconsistency except due to the fact that the validator is
>> incorrectly
>> attempting to validate the alignment of the destination region
>> according
>> to a rule that doesn't apply to HF types.
>> 
>> > > > Anyway, that was my rationale for the Fixes tag, but if you
>> > > > think
>> > > > this
>> > > > is not useful I am happy to drop this patch and just include it
>> > > > as
>> > > > part
>> > > > of my series without the tag.
>> > > > 
>> > > 
>> > > I'd like to see the actual regioning restrictions for HF types
>> > > implemented in the EU validator as part of your series.
>> > 
>> > Ok, let's see if we can agree on what restrictions should we
>> > implement
>> > then. I can implement this restriction as documented:
>> > 
>> > "Conversion between Integer and HF (Half Float) must be DWord-
>> > aligned
>> > and strided by a DWord on the destination"
>> > 
>> > Instead of trying to apply the general one that is currently
>> > breaking.
>> > That will fix the assertion issue. I guess my issue with it was
>> > that
>> > going by your analysis this restriction is not telling the full
>> > picture, this is what you had to say about this restriction:
>> > 
>> > "I have a feeling that the reason for this may be that the 16-bit
>> > pipeline lacks the ability to handle conversions from or to half-
>> > float, 
>> > so the execution type is implicitly promoted to the matching
>> > (integer
>> > or floating-point) 32-bit type where any HF conversion would be
>> > needed.  And on those the usual alignment restriction of the
>> > destination to a larger execution type applies."
>> > 
>> > But I guess your point is that we can ignore these details at the
>> > validator level and just focus on validating strictly what the PRM
>> > says. Fair enough.
>> > 
>> > Unfortunatley, you also mentioned that this restriction *seems* to
>> > be
>> > overriden by this one on all platforms but BDW (even though the
>> > both
>> > are listed, at least I see both in my SKL docs):
>> > 
>> > "There is a relaxed alignment rule for word destinations. When the
>> > destination type is word (UW, W, HF), destination data types can be
>> > aligned to either the lowest word or the second lowest word of the
>> > execution channel. This means the destination data words can be
>> > either
>> > all in the even word locations or all in the odd word locations."
>> > 
>> 
>> I'd implement this one since it seems like the most specific, except
>> on
>> BDW where only the previous (almost equivalent) restriction seems to
>> apply. 
>
> I have just realized that this restriction could be at odds with this
> other aspect of the spec:
>
> "There is no direct conversion from B/UB to DF or DF to B/UB. Use two
> instructions and a word or DWord intermediate type."
>
> So it is okay to convert from B to W and then from W to DF/Q (it has
> the same text from conversions between HF and 64-bit types). We
> actually emit these conversions and they work fine going by the
> existing CTS tests, but these conversions are not aligned to 32-bit
> like the restriction states, instead they are aligned to 64-bit (the
> regioning lowering pass does this):
>
> mov(8) g14<4>W   g5<4,4,1>Q   { align1 2Q };
>
> So I think this is a bit inconsistent again and I guess we need to
> choose what we want to do. I can think of 4 options:
>
> 1. We could assume that the restriction is correctly formulated and
> split the instructions to respect the restriction as-is even if we have
> not found any evidence that this doesn't work
>
> 2. We could assume that the restriction is correctly formulated and the
> regioning lowering pass need to be fixed to generate a conversion with
> a stride of 2 even if the execution size if 64-bit (haven't tried this
> since it sounds sketchy to me).
>
> 3. We could understand that the restriction is strictly formulated for
> conversions from a 32-bit execution types, and decide that it does not
> apply to conversions from 64-bit sources.
>
> 4. We can assume that the restriction is formulated incorrectly and
> instead of saying that destination words should all be in even or odd
> slots, assuming a conversion from a 32-bit type, it should just say
> that they should be aligned to the execution type to also accomodate
> conversions from 64-bit types.
>

4. is basically what the back-end is assuming right now, but it is
indeed not literally equivalent to the regioning restrictions as they're
formulated in the B-Spec.  I'd suggest not validating the DWORD
alignment rule for HF instructions whenever the execution type is
greater than 32 bits, since the B-Spec language about those is rather
inconsistent.

> And indepedently of how we decide to interpret the restriction in the
> end, given that we have discussed here that we should only validate
> things as they are described in the PRM, I think we also need to decide
> what we implement in the validator (if anything) if we decide to go
> with either 3) or 4) (were we would not be exactly following the PRM to
> the letter).
>
> I kind of think that 4) might be what is going on here...
> what do you think?
>
>>  There are shitloads of other restrictions we're missing for HF
>> types BTW, check out the section "Special Requirements for Handling
>> Mixed Mode Float Operations" of the hardware spec.
>> 
>> > Which you discussed didn't make sense to you and I agreed, if only
>> > because my own experiments suggested that the implications that one
>> > would derive from a strict interpretation of this (that packed 16-
>> > bit
>> > data is not supported) are not quite true going by the fact that we
>> > are
>> > emitting packed instructions on all platforms without any
>> > indication
>> > that this doesn't work. So what should we do about this one? If we
>> > decide that we want to implement this then we have to re-think the
>> > implementation we have.
>> > 
>> 
>> The implication on packed HF instructions is likely accidental, I
>> don't
>> think we should enforce the rule except for conversion operations.
>> 
>> > Should we just validate the one about the integer/float
>> > conversions?
>> > Should we do that only on BDW or on all platforms?
>> > 
>> > >   I don't see the
>> > > need to introduce any empirical assumptions into the EU
>> > > validator's
>> > > execution_type() in order to achieve that, even if that means
>> > > that
>> > > get_exec_type() and execution_type() don't do the exact same
>> > > calculation
>> > > -- What you call an inconsistency is the consequence of
>> > > execution_type()
>> > > being the hardware spec's opinion on what the execution type is,
>> > > which
>> > > we assume is what we need to use while enforcing a regioning
>> > > restriction
>> > > that refers to the execution type of the instruction.
>> > 
>> > Mmm... I guess my question is: is the hardware really assuming that
>> > execution type in all cases that involve HF instructions? What I
>> > got
>> > from your analysis of the whole situation is that the hardware is
>> > actually promoting the execution size to 32-bit in some cases, and
>> > if
>> > that is the case, then does it make sense to implement code in the
>> > validator that ignores that fact?
>> 
>> If the hardware spec chose to ignore that fact and instead gave us a
>> different set of regioning restrictions for us to use on HF
>> restrictions
>> that don't rely on the definition of execution type, we should
>> probably
>> have the validator match the hardware spec literally.
>> 
>> > I understand that you think this doesn't have any significance once
>> > we
>> > make sure that we only apply the restrictions specifically
>> > documented
>> > for HF from the PRM, and you are probably right, however I have to
>> > admit that it feels a bit odd to me that we do that when we know
>> > that
>> > under the hood there is much more going on and we are producing FS
>> > IR
>> > code following different rules after all. But maybe it is just me
>> > being too paranoid about this...
>> > 
>> 
>> Wouldn't it be even odder to have the hardware restriction validator
>> diverge from the documentation it's supposedly validating?
>> 
>> > 
>> > > > >   The validator is
>> > > > > still missing an implementation of the quirky HF
>> > > > > restrictions,
>> > > > > and it
>> > > > > wasn't the purpose of c84ec70b3a72 to do such a thing.
>> > > > 
>> > > > While this is true in general, the EU validator does consider
>> > > > the
>> > > > execution type of the instruction to validate general rules
>> > > > such as
>> > > > the
>> > > > one I mentioned in the commit message in this patch. And that
>> > > > part
>> > > > of
>> > > > the validator is inconsistent with c84ec70b3a72.
>> > > 
>> > > That part of the validator was also inconsistent with the code
>> > > generated
>> > > by your original VK_KHR_shader_float16_int8 series even before I
>> > > committed c84ec70b3a72.  The reason is that it is trying to
>> > > validate
>> > > a
>> > > restriction that rejects working code, because the "general" rule
>> > > it's
>> > > trying to enforce isn't supposed to apply to instructions with HF
>> > > operands, which is the real bug.
>> > 
>> > I agree with that, and I'll fix that in my series by preventing
>> > that
>> > rule to apply to these instructions (pending to agree on the rules
>> > that
>> > we should validate instead), but as I have already explained above,
>> > that was not the inconsistency I was referring to.
>> > 
>> 
>> It's the only actual inconsistency this was fixing.
>> 
>> > > > In fact, the EU validator is accounting for execution size
>> > > > promotion
>> > > > of HF instructions to 32-bit in SKL+ and CHV only, for
>> > > > conversions
>> > > > from HF->F and mixed float mode instructions... which is part
>> > > > of
>> > > > what
>> > > > c84ec70b3a72 addresses at the IR level, which it actually does
>> > > > for
>> > > > all
>> > > > hardware platforms and in more cases.
>> > > > 
>> > > 
>> > > I'm fine with fixing execution_type() to do the right thing in
>> > > more
>> > > cases and platforms, but I don't think that should involve making
>> > > empirical assumptions in the validator.
>> > 
>> > Just to be clear, in the particular case we are discussing here, I
>> > understand this means that we should leave the validator's
>> > execution_type() as-is.
>> > 
>> 
>> That's not what I was saying.  Last time I looked at execution_type()
>> it
>> did seem somewhat sketchy regarding HF types.  I wouldn't be
>> surprised
>> if it needs some fixes to match the hardware spec (*not* our
>> empirical
>> knowledge of the hardware) in addition to the above.
>> 
>> > > > >   You *should*
>> > > > > definitely implement those restrictions (as they're stated in
>> > > > > the
>> > > > > hardware spec, without empirical assumptions) in the
>> > > > > validator as
>> > > > > part
>> > > > > of your VK_KHR_shader_float16_int8 series,
>> > > > 
>> > > > Again, I am not sure what you mean by "without empirical
>> > > > assumptions".
>> > > 
>> > > I was just paraphrasing your comment.  If you feel the need to
>> > > write
>> > > a
>> > > comment justifying the existence of some code in the validator
>> > > based
>> > > on
>> > > empirical testing, it probably doesn't belong in the validator.
>> > > 
>> > > > According to the commit message in c84ec70b3a72 "the docs are
>> > > > fairly
>> > > > imcomplete and inconsistent" and you explained here that you
>> > > > had to
>> > > > do
>> > > > some experimentation of your own using the simulator (where you
>> > > > found
>> > > > its results to also be inconsistent with the hardware docs) to
>> > > > try
>> > > > and
>> > > > guess what is happening. That's why I was using the term
>> > > > "empirical"
>> > > > here, since the hardware docs alone aren't clear or correct
>> > > > enough
>> > > > to
>> > > > understand what is really happening, when and in what
>> > > > platforms.
>> > > > 
>> > > 
>> > > If some hardware restriction is described in a contradictory or
>> > > ambiguous way by the hardware spec we should probably avoid
>> > > enforcing
>> > > it
>> > > altogether in the EU validator.
>> > 
>> > Okay, in that case I guess you would rule the restriction about the
>> > relaxed alignment rule for Word destinations as such, but not the
>> > one
>> > about half-float/integer conversions. Is that correct?
>> > 
>> 
>> We probably need to enforce both since they apply to different sets
>> of
>> hardware.
>> 
>> > > > Anyway, if you don't like the term "empirical" to refer to all
>> > > > this,
>> > > > that's fine by me, but what I need to know is if we agree that
>> > > > we
>> > > > need
>> > > > to implement the same type promotion rules in the validator
>> > > > that we
>> > > > are
>> > > > implementing in the IR, indepedently of whether refer to them
>> > > > as
>> > > > "based
>> > > > on empirical testing" or not.
>> > > > 
>> > > 
>> > > I don't think we agree on that.  IMHO you want to enforce the
>> > > regioning
>> > > restrictions that the hardware spec describes for HF types rather
>> > > than
>> > > the current (incorrect and incomplete) generalization of the
>> > > standard
>> > > restrictions.  With those in place the apparent inconsistency
>> > > between
>> > > get_exec_type() and the hardware spec's execution type would be
>> > > harmless.
>> > 
>> > I agree with that. I still feel uneasy about having two different 
>> > implementations of the execution type in FS IR and the validator,
>> > even
>> > if that doesn't lead to any issues when we do what you say here...
>> > 
>> 
>> I feel uneasy about the validator not matching the hardware spec.
>> Luckily its only purpose is validating the rest of the compiler, so
>> if
>> there is an actual inconsistency as you seem to think, it will come
>> up.
>> 
>> > At the very least I think that deserves a comment in the validator
>> > explaining why we think that is okay.
>> > 
>> 
>> A quote of the hardware spec should be enough of a
>> comment.  Right?  And
>> there is already a comment in get_exec_type() explaining why its HF
>> type
>> handling logic is consistent with the hardware spec restrictions for
>> HF
>> types.
>> 
>> > > > >  if anything because currently
>> > > > > it will reject working code that uses HF types.
>> > > > 
>> > > > Just for the sake of clarity, since that sentence above could
>> > > > be
>> > > > interpreted as if all HF code would be rejected: we have been
>> > > > using
>> > > > HF
>> > > > types since we landed VK_KHR_16bit_storage, which includes
>> > > > conversions
>> > > > between HF and F and the EU validator is not complaining about
>> > > > any
>> > > > of
>> > > > that. It is currently a problem only with conversions to
>> > > > smaller
>> > > > types
>> > > > (so only conversions to Byte) because that's where we check for
>> > > > that
>> > > > restriction on the stride, which is based on the execution type
>> > > > of
>> > > > the
>> > > > instruction.
>> > > > 
>> > > > > 
>> > > > > > ---
>> > > > > >  src/intel/compiler/brw_eu_validate.c | 27 ++++++++++++++
>> > > > > > ----
>> > > > > > ----
>> > > > > > -----
>> > > > > >  1 file changed, 14 insertions(+), 13 deletions(-)
>> > > > > > 
>> > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c
>> > > > > > b/src/intel/compiler/brw_eu_validate.c
>> > > > > > index a25010b225c..3bb37677672 100644
>> > > > > > --- a/src/intel/compiler/brw_eu_validate.c
>> > > > > > +++ b/src/intel/compiler/brw_eu_validate.c
>> > > > > > @@ -325,17 +325,20 @@ execution_type(const struct
>> > > > > > gen_device_info
>> > > > > > *devinfo, const brw_inst *inst)
>> > > > > >     unsigned num_sources = num_sources_from_inst(devinfo,
>> > > > > > inst);
>> > > > > >     enum brw_reg_type src0_exec_type, src1_exec_type;
>> > > > > >  
>> > > > > > -   /* Execution data type is independent of destination
>> > > > > > data
>> > > > > > type,
>> > > > > > except in
>> > > > > > -    * mixed F/HF instructions on CHV and SKL+.
>> > > > > > +   /* Empirical testing suggests that type conversions
>> > > > > > involving
>> > > > > > half-float
>> > > > > > +    * promote execution type to 32-bit. See
>> > > > > > get_exec_type() in
>> > > > > > brw_ir_fs.h.
>> > > > > >      */
>> > > > > >     enum brw_reg_type dst_exec_type =
>> > > > > > brw_inst_dst_type(devinfo,
>> > > > > > inst);
>> > > > > >  
>> > > > > >     src0_exec_type =
>> > > > > > execution_type_for_type(brw_inst_src0_type(devinfo, inst));
>> > > > > >     if (num_sources == 1) {
>> > > > > > -      if ((devinfo->gen >= 9 || devinfo->is_cherryview) &&
>> > > > > > -          src0_exec_type == BRW_REGISTER_TYPE_HF) {
>> > > > > > -         return dst_exec_type;
>> > > > > > +      if (type_sz(src0_exec_type) == 2 && dst_exec_type !=
>> > > > > > src0_exec_type) {
>> > > > > > +         if (src0_exec_type == BRW_REGISTER_TYPE_HF)
>> > > > > > +            return BRW_REGISTER_TYPE_F;
>> > > > > > +         else if (dst_exec_type == BRW_REGISTER_TYPE_HF)
>> > > > > > +            return BRW_REGISTER_TYPE_D;
>> > > > > >        }
>> > > > > > +
>> > > > > >        return src0_exec_type;
>> > > > > >     }
>> > > > > >  
>> > > > > > @@ -367,14 +370,12 @@ execution_type(const struct
>> > > > > > gen_device_info
>> > > > > > *devinfo, const brw_inst *inst)
>> > > > > >         src1_exec_type == BRW_REGISTER_TYPE_DF)
>> > > > > >        return BRW_REGISTER_TYPE_DF;
>> > > > > >  
>> > > > > > -   if (devinfo->gen >= 9 || devinfo->is_cherryview) {
>> > > > > > -      if (dst_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > -          src0_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > -          src1_exec_type == BRW_REGISTER_TYPE_F) {
>> > > > > > -         return BRW_REGISTER_TYPE_F;
>> > > > > > -      } else {
>> > > > > > -         return BRW_REGISTER_TYPE_HF;
>> > > > > > -      }
>> > > > > > +   if (dst_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > +       src0_exec_type == BRW_REGISTER_TYPE_F ||
>> > > > > > +       src1_exec_type == BRW_REGISTER_TYPE_F) {
>> > > > > > +      return BRW_REGISTER_TYPE_F;
>> > > > > > +   } else {
>> > > > > > +      return BRW_REGISTER_TYPE_HF;
>> > > > > >     }
>> > > > > >  
>> > > > > >     assert(src0_exec_type == BRW_REGISTER_TYPE_F);
>> > > > > > -- 
>> > > > > > 2.17.1

Attachment: signature.asc
Description: PGP signature

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to