Iago Toral <ito...@igalia.com> writes: > On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote: >> Iago Toral <ito...@igalia.com> writes: >> >> > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote: >> > > Iago Toral <ito...@igalia.com> writes: >> > > >> > > > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote: >> > > > > Iago Toral Quiroga <ito...@igalia.com> writes: >> > > > > >> > > > > > Commit c84ec70b3a72 implemented execution type promotion to >> > > > > > 32- >> > > > > > bit >> > > > > > for >> > > > > > conversions involving half-float registers, which empirical >> > > > > > testing >> > > > > > suggested >> > > > > > was required, but it did not incorporate this change into >> > > > > > the >> > > > > > assembly validator >> > > > > > logic. This commits adds that, preventing validation errors >> > > > > > like >> > > > > > this: >> > > > > > >> > > > > >> > > > > I don't think we should be validating empirical assumptions >> > > > > in >> > > > > the EU >> > > > > validator. >> > > > >> > > > I am not sure I get your point, isn't c84ec70b3a72 also based >> > > > on >> > > > empirical testing after all? >> > > > >> > > >> > > To some extent, but it doesn't attempt to enforce ISA >> > > restrictions >> > > based >> > > on information obtained empirically. >> > > >> > > > >> > > > > > mov(16) g9<4>B g3<16,8,2>HF { align1 1H }; >> > > > > > ERROR: Destination stride must be equal to the ratio of the >> > > > > > sizes >> > > > > > of the >> > > > > > execution data type to the destination type >> > > > > > >> > > > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type to >> > > > > > 32-bit >> > > > > > when any half-float conversion is needed." >> > > > > >> > > > > I don't think this "fixes" anything that ever worked. >> > > > >> > > > It is true that the code in that trace above is not something >> > > > we >> > > > can >> > > > produce right now, because it is a conversion from HF to B and >> > > > that >> > > > should only happen within the context of >> > > > VK_KHR_shader_float16_int8, >> > > > however, this is a consequence of the fact that since >> > > > c84ec70b3a72 >> > > > there is an inconsistency between what we do at the IR level >> > > > regarding >> > > > execution size of HF conversions and what the EU validator is >> > > > doing, >> > > > and from that perspective this is really fixing an >> > > > inconsistency >> > > > that >> > > > didn't exist before, and I thought we would want to address >> > > > that >> > > > sooner >> > > > rather than later and track it down to the original change that >> > > > introduced that inconsistency so we know where this is coming >> > > > from. >> > > > >> > > >> > > The "inconsistency" between the IR's get_exec_type() and the EU >> > > validator's execution_type() has existed ever since >> > > a05b6f25bf4bfad7 >> > > removed the HF assert from get_exec_type() without actually >> > > implementing >> > > the code required to handle HF operands (which is what my commit >> > > c84ec70b3a72 did). >> > >> > I agree with the fact that since a05b6f25bf4bfad7 the validator >> > could >> > reject valid code and that had nothing to do with your patch, >> >> The validator rejected the same valid HF code since it was written, >> that >> had nothing to do with neither a05b6f25bf4bfad7 nor with my patch, >> and >> it is the real problem this patch was working around. >> >> > but the inconsistency I am talking about here, that this patch >> > fixes, >> > is the one about get_exec_type() in the IR and execution_type() in >> > the >> > validator doing different things for HF instructions, which only >> > exists since your patch and which you discuss below. >> > >> >> The "inconsistency" exists ever since get_exec_type() was introduced >> without correct handling of HF types (even though execution_type() >> already attempted to handle it). And I disagree that it's a real >> inconsistency except due to the fact that the validator is >> incorrectly >> attempting to validate the alignment of the destination region >> according >> to a rule that doesn't apply to HF types. >> >> > > > Anyway, that was my rationale for the Fixes tag, but if you >> > > > think >> > > > this >> > > > is not useful I am happy to drop this patch and just include it >> > > > as >> > > > part >> > > > of my series without the tag. >> > > > >> > > >> > > I'd like to see the actual regioning restrictions for HF types >> > > implemented in the EU validator as part of your series. >> > >> > Ok, let's see if we can agree on what restrictions should we >> > implement >> > then. I can implement this restriction as documented: >> > >> > "Conversion between Integer and HF (Half Float) must be DWord- >> > aligned >> > and strided by a DWord on the destination" >> > >> > Instead of trying to apply the general one that is currently >> > breaking. >> > That will fix the assertion issue. I guess my issue with it was >> > that >> > going by your analysis this restriction is not telling the full >> > picture, this is what you had to say about this restriction: >> > >> > "I have a feeling that the reason for this may be that the 16-bit >> > pipeline lacks the ability to handle conversions from or to half- >> > float, >> > so the execution type is implicitly promoted to the matching >> > (integer >> > or floating-point) 32-bit type where any HF conversion would be >> > needed. And on those the usual alignment restriction of the >> > destination to a larger execution type applies." >> > >> > But I guess your point is that we can ignore these details at the >> > validator level and just focus on validating strictly what the PRM >> > says. Fair enough. >> > >> > Unfortunatley, you also mentioned that this restriction *seems* to >> > be >> > overriden by this one on all platforms but BDW (even though the >> > both >> > are listed, at least I see both in my SKL docs): >> > >> > "There is a relaxed alignment rule for word destinations. When the >> > destination type is word (UW, W, HF), destination data types can be >> > aligned to either the lowest word or the second lowest word of the >> > execution channel. This means the destination data words can be >> > either >> > all in the even word locations or all in the odd word locations." >> > >> >> I'd implement this one since it seems like the most specific, except >> on >> BDW where only the previous (almost equivalent) restriction seems to >> apply. > > I have just realized that this restriction could be at odds with this > other aspect of the spec: > > "There is no direct conversion from B/UB to DF or DF to B/UB. Use two > instructions and a word or DWord intermediate type." > > So it is okay to convert from B to W and then from W to DF/Q (it has > the same text from conversions between HF and 64-bit types). We > actually emit these conversions and they work fine going by the > existing CTS tests, but these conversions are not aligned to 32-bit > like the restriction states, instead they are aligned to 64-bit (the > regioning lowering pass does this): > > mov(8) g14<4>W g5<4,4,1>Q { align1 2Q }; > > So I think this is a bit inconsistent again and I guess we need to > choose what we want to do. I can think of 4 options: > > 1. We could assume that the restriction is correctly formulated and > split the instructions to respect the restriction as-is even if we have > not found any evidence that this doesn't work > > 2. We could assume that the restriction is correctly formulated and the > regioning lowering pass need to be fixed to generate a conversion with > a stride of 2 even if the execution size if 64-bit (haven't tried this > since it sounds sketchy to me). > > 3. We could understand that the restriction is strictly formulated for > conversions from a 32-bit execution types, and decide that it does not > apply to conversions from 64-bit sources. > > 4. We can assume that the restriction is formulated incorrectly and > instead of saying that destination words should all be in even or odd > slots, assuming a conversion from a 32-bit type, it should just say > that they should be aligned to the execution type to also accomodate > conversions from 64-bit types. >
4. is basically what the back-end is assuming right now, but it is indeed not literally equivalent to the regioning restrictions as they're formulated in the B-Spec. I'd suggest not validating the DWORD alignment rule for HF instructions whenever the execution type is greater than 32 bits, since the B-Spec language about those is rather inconsistent. > And indepedently of how we decide to interpret the restriction in the > end, given that we have discussed here that we should only validate > things as they are described in the PRM, I think we also need to decide > what we implement in the validator (if anything) if we decide to go > with either 3) or 4) (were we would not be exactly following the PRM to > the letter). > > I kind of think that 4) might be what is going on here... > what do you think? > >> There are shitloads of other restrictions we're missing for HF >> types BTW, check out the section "Special Requirements for Handling >> Mixed Mode Float Operations" of the hardware spec. >> >> > Which you discussed didn't make sense to you and I agreed, if only >> > because my own experiments suggested that the implications that one >> > would derive from a strict interpretation of this (that packed 16- >> > bit >> > data is not supported) are not quite true going by the fact that we >> > are >> > emitting packed instructions on all platforms without any >> > indication >> > that this doesn't work. So what should we do about this one? If we >> > decide that we want to implement this then we have to re-think the >> > implementation we have. >> > >> >> The implication on packed HF instructions is likely accidental, I >> don't >> think we should enforce the rule except for conversion operations. >> >> > Should we just validate the one about the integer/float >> > conversions? >> > Should we do that only on BDW or on all platforms? >> > >> > > I don't see the >> > > need to introduce any empirical assumptions into the EU >> > > validator's >> > > execution_type() in order to achieve that, even if that means >> > > that >> > > get_exec_type() and execution_type() don't do the exact same >> > > calculation >> > > -- What you call an inconsistency is the consequence of >> > > execution_type() >> > > being the hardware spec's opinion on what the execution type is, >> > > which >> > > we assume is what we need to use while enforcing a regioning >> > > restriction >> > > that refers to the execution type of the instruction. >> > >> > Mmm... I guess my question is: is the hardware really assuming that >> > execution type in all cases that involve HF instructions? What I >> > got >> > from your analysis of the whole situation is that the hardware is >> > actually promoting the execution size to 32-bit in some cases, and >> > if >> > that is the case, then does it make sense to implement code in the >> > validator that ignores that fact? >> >> If the hardware spec chose to ignore that fact and instead gave us a >> different set of regioning restrictions for us to use on HF >> restrictions >> that don't rely on the definition of execution type, we should >> probably >> have the validator match the hardware spec literally. >> >> > I understand that you think this doesn't have any significance once >> > we >> > make sure that we only apply the restrictions specifically >> > documented >> > for HF from the PRM, and you are probably right, however I have to >> > admit that it feels a bit odd to me that we do that when we know >> > that >> > under the hood there is much more going on and we are producing FS >> > IR >> > code following different rules after all. But maybe it is just me >> > being too paranoid about this... >> > >> >> Wouldn't it be even odder to have the hardware restriction validator >> diverge from the documentation it's supposedly validating? >> >> > >> > > > > The validator is >> > > > > still missing an implementation of the quirky HF >> > > > > restrictions, >> > > > > and it >> > > > > wasn't the purpose of c84ec70b3a72 to do such a thing. >> > > > >> > > > While this is true in general, the EU validator does consider >> > > > the >> > > > execution type of the instruction to validate general rules >> > > > such as >> > > > the >> > > > one I mentioned in the commit message in this patch. And that >> > > > part >> > > > of >> > > > the validator is inconsistent with c84ec70b3a72. >> > > >> > > That part of the validator was also inconsistent with the code >> > > generated >> > > by your original VK_KHR_shader_float16_int8 series even before I >> > > committed c84ec70b3a72. The reason is that it is trying to >> > > validate >> > > a >> > > restriction that rejects working code, because the "general" rule >> > > it's >> > > trying to enforce isn't supposed to apply to instructions with HF >> > > operands, which is the real bug. >> > >> > I agree with that, and I'll fix that in my series by preventing >> > that >> > rule to apply to these instructions (pending to agree on the rules >> > that >> > we should validate instead), but as I have already explained above, >> > that was not the inconsistency I was referring to. >> > >> >> It's the only actual inconsistency this was fixing. >> >> > > > In fact, the EU validator is accounting for execution size >> > > > promotion >> > > > of HF instructions to 32-bit in SKL+ and CHV only, for >> > > > conversions >> > > > from HF->F and mixed float mode instructions... which is part >> > > > of >> > > > what >> > > > c84ec70b3a72 addresses at the IR level, which it actually does >> > > > for >> > > > all >> > > > hardware platforms and in more cases. >> > > > >> > > >> > > I'm fine with fixing execution_type() to do the right thing in >> > > more >> > > cases and platforms, but I don't think that should involve making >> > > empirical assumptions in the validator. >> > >> > Just to be clear, in the particular case we are discussing here, I >> > understand this means that we should leave the validator's >> > execution_type() as-is. >> > >> >> That's not what I was saying. Last time I looked at execution_type() >> it >> did seem somewhat sketchy regarding HF types. I wouldn't be >> surprised >> if it needs some fixes to match the hardware spec (*not* our >> empirical >> knowledge of the hardware) in addition to the above. >> >> > > > > You *should* >> > > > > definitely implement those restrictions (as they're stated in >> > > > > the >> > > > > hardware spec, without empirical assumptions) in the >> > > > > validator as >> > > > > part >> > > > > of your VK_KHR_shader_float16_int8 series, >> > > > >> > > > Again, I am not sure what you mean by "without empirical >> > > > assumptions". >> > > >> > > I was just paraphrasing your comment. If you feel the need to >> > > write >> > > a >> > > comment justifying the existence of some code in the validator >> > > based >> > > on >> > > empirical testing, it probably doesn't belong in the validator. >> > > >> > > > According to the commit message in c84ec70b3a72 "the docs are >> > > > fairly >> > > > imcomplete and inconsistent" and you explained here that you >> > > > had to >> > > > do >> > > > some experimentation of your own using the simulator (where you >> > > > found >> > > > its results to also be inconsistent with the hardware docs) to >> > > > try >> > > > and >> > > > guess what is happening. That's why I was using the term >> > > > "empirical" >> > > > here, since the hardware docs alone aren't clear or correct >> > > > enough >> > > > to >> > > > understand what is really happening, when and in what >> > > > platforms. >> > > > >> > > >> > > If some hardware restriction is described in a contradictory or >> > > ambiguous way by the hardware spec we should probably avoid >> > > enforcing >> > > it >> > > altogether in the EU validator. >> > >> > Okay, in that case I guess you would rule the restriction about the >> > relaxed alignment rule for Word destinations as such, but not the >> > one >> > about half-float/integer conversions. Is that correct? >> > >> >> We probably need to enforce both since they apply to different sets >> of >> hardware. >> >> > > > Anyway, if you don't like the term "empirical" to refer to all >> > > > this, >> > > > that's fine by me, but what I need to know is if we agree that >> > > > we >> > > > need >> > > > to implement the same type promotion rules in the validator >> > > > that we >> > > > are >> > > > implementing in the IR, indepedently of whether refer to them >> > > > as >> > > > "based >> > > > on empirical testing" or not. >> > > > >> > > >> > > I don't think we agree on that. IMHO you want to enforce the >> > > regioning >> > > restrictions that the hardware spec describes for HF types rather >> > > than >> > > the current (incorrect and incomplete) generalization of the >> > > standard >> > > restrictions. With those in place the apparent inconsistency >> > > between >> > > get_exec_type() and the hardware spec's execution type would be >> > > harmless. >> > >> > I agree with that. I still feel uneasy about having two different >> > implementations of the execution type in FS IR and the validator, >> > even >> > if that doesn't lead to any issues when we do what you say here... >> > >> >> I feel uneasy about the validator not matching the hardware spec. >> Luckily its only purpose is validating the rest of the compiler, so >> if >> there is an actual inconsistency as you seem to think, it will come >> up. >> >> > At the very least I think that deserves a comment in the validator >> > explaining why we think that is okay. >> > >> >> A quote of the hardware spec should be enough of a >> comment. Right? And >> there is already a comment in get_exec_type() explaining why its HF >> type >> handling logic is consistent with the hardware spec restrictions for >> HF >> types. >> >> > > > > if anything because currently >> > > > > it will reject working code that uses HF types. >> > > > >> > > > Just for the sake of clarity, since that sentence above could >> > > > be >> > > > interpreted as if all HF code would be rejected: we have been >> > > > using >> > > > HF >> > > > types since we landed VK_KHR_16bit_storage, which includes >> > > > conversions >> > > > between HF and F and the EU validator is not complaining about >> > > > any >> > > > of >> > > > that. It is currently a problem only with conversions to >> > > > smaller >> > > > types >> > > > (so only conversions to Byte) because that's where we check for >> > > > that >> > > > restriction on the stride, which is based on the execution type >> > > > of >> > > > the >> > > > instruction. >> > > > >> > > > > >> > > > > > --- >> > > > > > src/intel/compiler/brw_eu_validate.c | 27 ++++++++++++++ >> > > > > > ---- >> > > > > > ---- >> > > > > > ----- >> > > > > > 1 file changed, 14 insertions(+), 13 deletions(-) >> > > > > > >> > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c >> > > > > > b/src/intel/compiler/brw_eu_validate.c >> > > > > > index a25010b225c..3bb37677672 100644 >> > > > > > --- a/src/intel/compiler/brw_eu_validate.c >> > > > > > +++ b/src/intel/compiler/brw_eu_validate.c >> > > > > > @@ -325,17 +325,20 @@ execution_type(const struct >> > > > > > gen_device_info >> > > > > > *devinfo, const brw_inst *inst) >> > > > > > unsigned num_sources = num_sources_from_inst(devinfo, >> > > > > > inst); >> > > > > > enum brw_reg_type src0_exec_type, src1_exec_type; >> > > > > > >> > > > > > - /* Execution data type is independent of destination >> > > > > > data >> > > > > > type, >> > > > > > except in >> > > > > > - * mixed F/HF instructions on CHV and SKL+. >> > > > > > + /* Empirical testing suggests that type conversions >> > > > > > involving >> > > > > > half-float >> > > > > > + * promote execution type to 32-bit. See >> > > > > > get_exec_type() in >> > > > > > brw_ir_fs.h. >> > > > > > */ >> > > > > > enum brw_reg_type dst_exec_type = >> > > > > > brw_inst_dst_type(devinfo, >> > > > > > inst); >> > > > > > >> > > > > > src0_exec_type = >> > > > > > execution_type_for_type(brw_inst_src0_type(devinfo, inst)); >> > > > > > if (num_sources == 1) { >> > > > > > - if ((devinfo->gen >= 9 || devinfo->is_cherryview) && >> > > > > > - src0_exec_type == BRW_REGISTER_TYPE_HF) { >> > > > > > - return dst_exec_type; >> > > > > > + if (type_sz(src0_exec_type) == 2 && dst_exec_type != >> > > > > > src0_exec_type) { >> > > > > > + if (src0_exec_type == BRW_REGISTER_TYPE_HF) >> > > > > > + return BRW_REGISTER_TYPE_F; >> > > > > > + else if (dst_exec_type == BRW_REGISTER_TYPE_HF) >> > > > > > + return BRW_REGISTER_TYPE_D; >> > > > > > } >> > > > > > + >> > > > > > return src0_exec_type; >> > > > > > } >> > > > > > >> > > > > > @@ -367,14 +370,12 @@ execution_type(const struct >> > > > > > gen_device_info >> > > > > > *devinfo, const brw_inst *inst) >> > > > > > src1_exec_type == BRW_REGISTER_TYPE_DF) >> > > > > > return BRW_REGISTER_TYPE_DF; >> > > > > > >> > > > > > - if (devinfo->gen >= 9 || devinfo->is_cherryview) { >> > > > > > - if (dst_exec_type == BRW_REGISTER_TYPE_F || >> > > > > > - src0_exec_type == BRW_REGISTER_TYPE_F || >> > > > > > - src1_exec_type == BRW_REGISTER_TYPE_F) { >> > > > > > - return BRW_REGISTER_TYPE_F; >> > > > > > - } else { >> > > > > > - return BRW_REGISTER_TYPE_HF; >> > > > > > - } >> > > > > > + if (dst_exec_type == BRW_REGISTER_TYPE_F || >> > > > > > + src0_exec_type == BRW_REGISTER_TYPE_F || >> > > > > > + src1_exec_type == BRW_REGISTER_TYPE_F) { >> > > > > > + return BRW_REGISTER_TYPE_F; >> > > > > > + } else { >> > > > > > + return BRW_REGISTER_TYPE_HF; >> > > > > > } >> > > > > > >> > > > > > assert(src0_exec_type == BRW_REGISTER_TYPE_F); >> > > > > > -- >> > > > > > 2.17.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev