A little over a month ago, I sent a 44 patch series with a bunch of the prerequisite patches for implementing SPIR-V subgroup support. This is a re-spin of that series with a few more patches. Most of the new fixes are either because of rebasing on top of my uniform reworks or are fixes for SIMD32. As of now, I have all but 8 of the subgroups tests passing with SIMD32 and those 8 appear to be issues with spilling but I'm not 100% sure.
Some of the patches in here overlap a bit with stuff that Connor did in his series for radv. In particular, I've taken a different approach which I like better to sorting out uint64_t vs. uvec4 for ballot intrinsics. Cc: Matt Turner <matts...@gmail.com> Cc: Francisco Jerez <curroje...@riseup.net> Cc: Connor Abbott <cwabbo...@gmail.com> Alejandro PiƱeiro (1): i965/fs: Add brw_reg_type_from_bit_size utility method Francisco Jerez (1): intel/fs: Restrict live intervals to the subset possibly reachable from any definition. Jason Ekstrand (50): intel/fs: Pass builders instead of blocks into emit_[un]zip intel/fs: Be more explicit about our placement of [un]zip intel/fs: Handle flag read/write aliasing in needs_src_copy intel/fs: Use ANY/ALL32 predicates in SIMD32 intel/fs: Don't stomp f0.1 in SIMD16 ballot intel/fs: Use an explicit D type for vote any/all/eq intrinsics intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all i965/fs: Extend the live ranges of VGRFs which leave loops i965/fs/nir: Use the nir_src_bit_size helper i965/fs/nir: Simplify 64-bit store_output i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write i965/fs/nir: Minor refactor of store_output i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src intel/fs: Protect opt_algebraic from OOB BROADCAST indices intel/fs: Uniformize the index in readInvocation intel/fs: Retype dest to match value in read[First]Invocation intel/fs: Assign constant locations if they haven't been assigned intel/fs: Remove min_dispatch_width from fs_visitor intel/cs: Drop min_dispatch_width checks from compile_cs intel/cs: Stop setting dispatch_grf_start_reg intel/cs: Ignore runtime_check_aads_emit for CS intel/fs: Mark 64-bit values as being contiguous intel/cs: Rework the way thread local ID is handled intel/cs: Re-run final NIR optimizations for each SIMD size intel/cs: Re-run final NIR optimizations for each SIMD size intel/cs: Push subgroup ID instead of base thread ID intel/compiler/fs: Set up subgroup invocation as a system value intel/fs: Rework zero-length URB write handling intel/eu: Use EXECUTE_1 for JMPI intel/eu: Make automatic exec sizes a configurable option intel/eu: Explicitly set EXECUTE_1 where needed intel/fs: Explicitly set EXECUTE_1 where needed intel/fs: Don't use automatic exec size inference anv/pipeline: Dump shader immedately after spirv_to_nir anv/pipeline: Drop nir_lower_clip_cull_distance_arrays anv/pipeline: Call nir_lower_system_valaues after brw_preprocess_nir nir/lower_wpos_ytransform: Support system value intrinsics i965/program: Move nir_lower_system_values higher up intel/compiler: Call nir_lower_system_values in brw_preprocess_nir nir/opt_intrinsics: Rework progress nir: Add a new subgroups lowering pass nir: Add a ssa_dest_init_for_type helper nir: Make ballot intrinsics variable-size nir/lower_system_values: Lower SUBGROUP_*_MASK based on type nir/lower_subgroups: Lower ballot intrinsics to the specified bit size nir,intel/compiler: Use a fixed subgroup size spirv: Add a vtn_constant_value helper spirv: Rework barriers nir: Validate base types on array dereferences compiler/nir_types: Handle vectors in glsl_get_array_element src/compiler/Makefile.sources | 2 +- src/compiler/glsl/glsl_to_nir.cpp | 1 + src/compiler/nir/nir.h | 25 +- src/compiler/nir/nir_intrinsics.h | 13 +- .../nir/nir_lower_read_invocation_to_scalar.c | 112 ------- src/compiler/nir/nir_lower_subgroups.c | 257 ++++++++++++++++ src/compiler/nir/nir_lower_system_values.c | 4 +- src/compiler/nir/nir_lower_wpos_ytransform.c | 4 + src/compiler/nir/nir_opt_intrinsics.c | 83 +---- src/compiler/nir/nir_validate.c | 18 +- src/compiler/nir_types.cpp | 2 + src/compiler/spirv/spirv_to_nir.c | 132 ++++++-- src/compiler/spirv/vtn_private.h | 6 + src/intel/compiler/brw_compiler.c | 4 - src/intel/compiler/brw_compiler.h | 3 +- src/intel/compiler/brw_eu.c | 1 + src/intel/compiler/brw_eu.h | 10 + src/intel/compiler/brw_eu_emit.c | 43 ++- src/intel/compiler/brw_fs.cpp | 246 +++++++++------ src/intel/compiler/brw_fs.h | 15 +- src/intel/compiler/brw_fs_generator.cpp | 14 +- src/intel/compiler/brw_fs_live_variables.cpp | 89 +++++- src/intel/compiler/brw_fs_live_variables.h | 12 + src/intel/compiler/brw_fs_nir.cpp | 337 +++++++++++++-------- src/intel/compiler/brw_fs_visitor.cpp | 78 +++-- src/intel/compiler/brw_nir.c | 13 +- src/intel/compiler/brw_nir.h | 2 +- src/intel/compiler/brw_nir_lower_cs_intrinsics.c | 56 +--- src/intel/vulkan/anv_cmd_buffer.c | 6 +- src/intel/vulkan/anv_pipeline.c | 18 +- src/mesa/drivers/dri/i965/brw_program.c | 1 - src/mesa/drivers/dri/i965/gen6_constant_state.c | 6 +- 32 files changed, 1051 insertions(+), 562 deletions(-) delete mode 100644 src/compiler/nir/nir_lower_read_invocation_to_scalar.c create mode 100644 src/compiler/nir/nir_lower_subgroups.c -- 2.5.0.400.gff86faf _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev