On Fri, Apr 20, 2018 at 5:16 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > On Fri, Apr 20, 2018 at 5:16 AM, Nicolai Hähnle <nhaeh...@gmail.com> wrote: >> >> On 20.04.2018 10:21, Iago Toral wrote: >>> >>> Hi, >>> >>> while developing support for Vulkan shaderInt16 on Anvil I came across >>> a feature of NIR that was a bit inconvenient: bools are always 32-bit >>> by design, but the Intel hardware produces 16-bit bool results for 16- >>> bit comparisons, so that creates a problem that manifests like this: >>> >>> vec1 32 ssa_21 = fge ssa_20, ssa_16 >>> vec1 16 ssa_22 = b2f ssa_21 > > > I was thinking about this a bit this morning and it gets even more sticky. > What happens if you have > > bool e = (a < b) && (c < d); > > where a and b are 16-bit and c and d are 32-bit? In this case, one > comprison has a 32-bit value and one has a 16-bit value and you have to pick > one for the &&. > >>> >>> Our CMP instruction will produce a 16-bit boolean result for the first >>> NIR instruction (where NIR expects it to be 32-bit), so by the time we >>> emit the second instruction in the driver the bit-size for the operand >>> of b2f provided by NIR no longer matches the reality and we emit >>> incorrect code. >>> >>> This seems to have been a consicious design choice in NIR, and while >>> discussing this with Jason he was unsure how much we wanted to change >>> this or how to do it, given how thoroughly 32-bit bools are baked into >>> NIR and the complexities that modifying this would also bring to our >>> bit-size validation code. >>> >>> I have been considering alternatives that didn't involve changing NIR >>> to support multiple bit-sizes for booleans: >>> >>> 1) Drivers that need to emit smaller booleans could try to fix the >>> generated NIR by correcting the expected bit-sizes for CMP >>> instructions. This would be rather trivial to implement in drivers (and >>> maybe we could even make a generic pass for other drivers to use if >>> they need it) but this will make the validator complain because it >>> won't recognize comparisons with 16-bit bool outputs as valid NIR >>> opcodes. I also found instances where nir_search would complain about >>> mismatching bit-sizes. I haven't looked any further into it yet though, >>> so maybe we can reasonably work around these issues. >>> >>> 2) Drivers could handle this specially when they emit code from NIR. >>> Specifically, when they see a 32-bit boolean source in an instruction, >>> they would have to search for the instruction that produced that source >>> value and check whether it is a 16-bit or a 32-bit boolean to emit >>> proper code for the instruction. >>> >>> 3) Drivers can just convert the 16-bit bool result they generate for >>> 16-bit cmp to the 32-bit bool that NIR expects, and then possibly run >>> an optimization pass to eliminate these extra conversions and fix up >>> the code accordingly. >> >> >> radeonsi(NIR) and radv already use option 3, since GCN hardware really >> wants to treat bools as 1-bit value, so that's what I'd suggest. The >> optimizations that cleanup the conversions happen in LLVM for us. > > > Is this a GCN thing or an LLVM thing? It would be neat if your hardware had > 1-bit registers. :-) We sort-of do but they're special flag registers and > we have very few of them.
LLVM. For GCN HW we use a 64-bit register that is shared between lanes (i.e. having 1 bit for each lane) > > --Jason > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev