Thanks! I don't mean to be a pest. However, not everyone is as good about keeping track of their backlog as you are so I thought it might be worth a reminder.
--Jason On Sat, Nov 3, 2018 at 7:59 PM Ian Romanick <i...@freedesktop.org> wrote: > I haven't forgotten... I'm planning to dig into this next week. > > On 11/02/2018 06:42 AM, Jason Ekstrand wrote: > > Bump > > > > On Mon, Oct 22, 2018 at 5:14 PM Jason Ekstrand <ja...@jlekstrand.net > > <mailto:ja...@jlekstrand.net>> wrote: > > > > This is something that Connor and I have talked about quite a bit > > over the > > last couple of months. The core idea is to replace NIR's current > 32-bit > > 0/-1 D3D10-style booleans with a 1-bit data type. All in all, I > > think it > > worked out pretty well though I really don't like the proliferation > of > > 32-bit comparison opcodes we now have kicking around for i965. > > > > Why? No hardware really has a 1-bit type, right? Well, sort of... > AMD > > actually uses 64-bit scalars for booleans with one bit per > invocation. > > However, most hardware such as Intel uses some other larger value for > > booleans. The real benefit of 1-bit booleans and requiring a > > lowering pass > > is that you can do somewhat custom lowering (like AMD wants) and your > > lowering pass can always tell in an instant if a value is a boolean > > based > > on the bit size. As can be seen in the last patch, this makes it > really > > easy to implement a bool -> float lowering pass for hardware that > > doesn't > > have real integers where NIR's current booleans are actually rather > > painful. > > > > On Intel, the situation is a bit more complicated. It's tempting to > say > > that we have 32-bit D3D10 booleans. However, they're not really > D3D10 > > booleans on gen4-5 because the top 31 bits are undefined garbage > > and, while > > iand, ior, ixor, and inot operations work, you have to iand with 1 > > at the > > last minute to clear off all that garbage. Also, on all > generations, a > > comparison of two N-bit values results in an N-bit boolean, not a > 32-bit > > bool. This has caused the Igalia folks no end of trouble as they've > > been > > working on native 8 and 16-bit support. If, instead, we have a > > 1-bit bool > > with a lowering pass and we can lower to whatever we want, then we > could > > lower to a set of comparison opcodes that return the same bit-size > > as they > > compare and it would match GEN hardware much better. > > > > But what about performance? Aren't there all sorts of neat tricks > > we can > > do with D3D10 booleans like b & 1.0f for b2f? As it turns out, not > > really; > > that's about the only one. There is some small advantage when > > optimizing > > shaders that come from D3D if your native representation of booleans > > matches that of D3D. However, penultimate patch in this series adds > > a few > > small optimizations that get us to actually better than we were > before. > > With the entire series, shader-db on Kaby Lak looks like this: > > > > total instructions in shared programs: 15084098 -> 14988578 > (-0.63%) > > instructions in affected programs: 1321114 -> 1225594 (-7.23%) > > helped: 2340 > > HURT: 23 > > > > total cycles in shared programs: 369790134 -> 359798399 (-2.70%) > > cycles in affected programs: 134085452 -> 124093717 (-7.45%) > > helped: 2149 > > HURT: 720 > > > > total loops in shared programs: 4393 -> 4393 (0.00%) > > loops in affected programs: 0 -> 0 > > helped: 0 > > HURT: 0 > > > > total spills in shared programs: 10158 -> 10051 (-1.05%) > > spills in affected programs: 1429 -> 1322 (-7.49%) > > helped: 8 > > HURT: 15 > > > > total fills in shared programs: 22105 -> 21720 (-1.74%) > > fills in affected programs: 2853 -> 2468 (-13.49%) > > helped: 12 > > HURT: 15 > > > > How about ease of use? Are they a pain to deal with? Yes, adding > > support > > for 1-bit types was a bit awkward in a few places but most of it was > > dealing with all the places where we have 32-bit booleans baked into > > assumptions. Getting rid of that baking in solves the problem and > also > > makes the whole IR more future-proof. > > > > All in all, I'd say I'm pretty happy with it. However, I'd like > other > > people (particularly the AMD folks) to play with it a bit and verify > > that > > it solves their problems as well. Also, I added a lowering pass and > > tried > > to turn it on in everyone's driver but may not have put it in the > right > > spot. Please double-check my work. For those wishing to take a > > look, you > > can also find the entire series on my gitlab here: > > > > > https://gitlab.freedesktop.org/jekstrand/mesa/commits/review/nir-1-bit-bool > > > > Please review! > > > > --Jason > > > > Cc: Connor Abbott <cwabbo...@gmail.com <mailto:cwabbo...@gmail.com>> > > Cc: Timothy Arceri <tarc...@itsqueeze.com > > <mailto:tarc...@itsqueeze.com>> > > Cc: Eric Anholt <e...@anholt.net <mailto:e...@anholt.net>> > > Cc: Rob Clark <robdcl...@gmail.com <mailto:robdcl...@gmail.com>> > > Cc: Karol Herbst <karolher...@gmail.com <mailto: > karolher...@gmail.com>> > > Cc: Bas Nieuwenhuizen <b...@basnieuwenhuizen.nl > > <mailto:b...@basnieuwenhuizen.nl>> > > Cc: Alyssa Rosenzweig <aly...@rosenzweig.io > > <mailto:aly...@rosenzweig.io>> > > > > > > Jason Ekstrand (31): > > nir/validate: Print when the validation failed > > nir/constant_folding: Add an unreachable to a switch > > nir/constant_folding: Use nir_src_as_bool for discard_if > > nir/builder: Add a nir_imm_true/false helpers > > nir/builder: Handle 16-bit floats in nir_imm_floatN_t > > nir/search: Use nir_builder > > nir/opt_if: Rework condition propagation > > nir/system_values: Use the bit size from the load_deref > > glsl/nir: Use i2b instead of ine for fixing UBO/SSBO booleans > > nir/prog: Use nir_bany in kill handling > > spirv: Use the right bit-size for spec constant ops > > spirv: Initialize subgroup destinations with the destination type > > intel/nir: Use the OPT macro for more passes > > nir/algebraic: Disable b2f lowering and two optimizations > > nir: Rename boolean-related opcodes to include 32 in the name > > FIXUP: nir/builder: Generate 32-bit bool opcodes transparently > > FIXUP: nir/algebraic: Remap boolean opcodes to the 32-bit variant > > FIXUP: Use 32-bit opcodes in the NIR producers and optimizations > > FIXUP: Use 32-bit opcodes in the NIR back-ends > > nir/constant_expressions: Rework boolean handling > > nir: Add support for 1-bit data types > > nir: Add 1-bit boolean opcodes > > nir: Add a bool to int32 lowering pass > > glsl,spirv: Generate 1-bit booleans > > FIXUP: Revert "Use 32-bit opcodes in the NIR producers and > > optimizations" > > FIXUP: Revert "nir/builder: Generate 32-bit bool opcodes > > transparently" > > FIXUP: nir/builder: Generate 1-bit booleans in nir_build_imm_bool > > nir/algebraic: Optimize 1-bit booleans > > nir/algebraic: Add back lower_b2f and a b2f optimization > > nir/algebraic: Add some optimizations for D3D-style booleans > > nir: Add a bool to float32 lowering pass > > > > src/amd/common/ac_nir_to_llvm.c | 30 +-- > > src/amd/vulkan/radv_shader.c | 6 +- > > src/broadcom/compiler/nir_to_vir.c | 44 ++--- > > src/broadcom/compiler/vir.c | 2 + > > src/compiler/Makefile.sources | 2 + > > src/compiler/glsl/glsl_to_nir.cpp | 28 +-- > > src/compiler/nir/meson.build | 2 + > > src/compiler/nir/nir.c | 15 +- > > src/compiler/nir/nir.h | 36 ++-- > > src/compiler/nir/nir_algebraic.py | 22 ++- > > src/compiler/nir/nir_builder.h | 51 ++++- > > src/compiler/nir/nir_constant_expressions.py | 53 ++--- > > src/compiler/nir/nir_instr_set.c | 23 ++- > > src/compiler/nir/nir_lower_alu_to_scalar.c | 4 + > > src/compiler/nir/nir_lower_bool_to_float.c | 181 > ++++++++++++++++++ > > src/compiler/nir/nir_lower_bool_to_int32.c | 162 ++++++++++++++++ > > src/compiler/nir/nir_lower_int64.c | 2 +- > > .../nir/nir_lower_load_const_to_scalar.c | 3 + > > src/compiler/nir/nir_lower_returns.c | 4 +- > > src/compiler/nir/nir_lower_subgroups.c | 2 +- > > src/compiler/nir/nir_lower_system_values.c | 1 + > > src/compiler/nir/nir_opcodes.py | 31 ++- > > src/compiler/nir/nir_opt_algebraic.py | 19 +- > > src/compiler/nir/nir_opt_constant_folding.c | 18 +- > > src/compiler/nir/nir_opt_if.c | 91 +++------ > > src/compiler/nir/nir_opt_intrinsics.c | 2 +- > > src/compiler/nir/nir_opt_large_constants.c | 5 + > > src/compiler/nir/nir_print.c | 3 + > > src/compiler/nir/nir_search.c | 112 +++-------- > > src/compiler/nir/nir_search.h | 9 +- > > src/compiler/nir/nir_validate.c | 16 +- > > src/compiler/nir/tests/vars_tests.cpp | 38 ++-- > > src/compiler/nir_types.cpp | 2 +- > > src/compiler/nir_types.h | 4 +- > > src/compiler/spirv/spirv_to_nir.c | 23 ++- > > src/compiler/spirv/vtn_alu.c | 4 +- > > src/compiler/spirv/vtn_cfg.c | 10 +- > > src/compiler/spirv/vtn_subgroup.c | 12 +- > > src/gallium/auxiliary/nir/tgsi_to_nir.c | 20 +- > > .../drivers/freedreno/ir3/ir3_compiler_nir.c | 31 +-- > > src/gallium/drivers/radeonsi/si_shader_nir.c | 2 + > > src/gallium/drivers/vc4/vc4_program.c | 50 ++--- > > src/intel/compiler/brw_fs_nir.cpp | 80 ++++---- > > src/intel/compiler/brw_nir.c | 16 +- > > .../brw_nir_analyze_boolean_resolves.c | 24 +-- > > .../compiler/brw_nir_lower_image_load_store.c | 2 +- > > src/intel/compiler/brw_vec4_nir.cpp | 122 ++++++------ > > src/intel/vulkan/anv_pipeline.c | 2 +- > > src/mesa/drivers/dri/i965/brw_program.c | 4 +- > > src/mesa/main/glspirv.c | 2 +- > > src/mesa/program/prog_to_nir.c | 2 +- > > src/mesa/state_tracker/st_glsl_to_nir.cpp | 2 +- > > 52 files changed, 934 insertions(+), 497 deletions(-) > > create mode 100644 src/compiler/nir/nir_lower_bool_to_float.c > > create mode 100644 src/compiler/nir/nir_lower_bool_to_int32.c > > > > -- > > 2.19.1 > > > > > > > > _______________________________________________ > > mesa-dev mailing list > > mesa-dev@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev