On 2/16/19 1:21 AM, Rhys Perry wrote:
This series add support for:
- VK_KHR_shader_float16_int8
- VK_AMD_gpu_shader_half_float
- VK_AMD_gpu_shader_int16
- VK_KHR_8bit_storage
on VI+. Half floats are disabled on LLVM 7 because of a bug causing large
memory usage and long (or unbounded) compilation times with some CTS
tests.
It is written against the following patch series:
- https://patchwork.freedesktop.org/series/53454/ (v4)
- https://patchwork.freedesktop.org/series/53660/ (v1)
With LLVM 9, there are no reproducable Vulkan CTS regressions with Vega
and VI except for
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_64_to_16.*
which fails or crashes because of unrelated radv bugs with 64-bit varyings
and because the tests use VK_FORMAT_R64_SFLOAT as a vertex format even
though radv does not support it.
test bug?
The two NIR related patches (22 and 25) should be sent separately,
otherwise people working on NIR might miss them.
With LLVM 9, there are no reproducable piglit regressions except for
glsl-array-bounds-12.shader_test because of a LLVM bug when
SLP vectorization is enabled.
With LLVM 8, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer and with the current lack of fallback
for 16-bit interpolation on LLVM versions before LLVM 9.
With LLVM 7, there are no reproducable Vulkan CTS regressions with Vega
and VI except for those with LLVM 9 and a couple of tests because of a
LLVM bug after the SLP vectorizer.
The SLP vectorization patch is marked as WIP because it exposes LLVM bugs
with piglit's glsl-array-bounds-12.shader_test, some Vulkan CTS tests and
some shader-db test for a game I can't remember. It also over-vectorizes
32-bit code which can cause significant worsening in generated code
quality.
The 16-bit interpolation patch is marked as WIP because it currently
requires intrinsics only available in LLVM 9 and does not have a fallback.
A branch on Github containing this series can be found at:
https://github.com/pendingchaos/mesa/commits/radv_fp16_int16_int8_v2
v2: rebase
v2: implement 16-bit interpolation
v2: move LLVMAddSLPVectorizePass to after LLVMAddEarlyCSEMemSSAPass
v2: run vectorization unconditionally on GFX9 and later
v2: remove ac_get_one(), ac_get_zero(), ac_get_onef() and ac_get_zerof()
v2: remove ac_int_of_size()
v2: fix 64-bit visit_load_var()
v2: mark VK_KHR_8bit_storage as DONE in features.txt
v2: mark SLP vectorization patch as WIP
v2: fix C++ style comment
Rhys Perry (41):
radv: bitcast 16-bit outputs to integers
radv: ensure export arguments are always float
ac: add various helpers for float16/int16/int8
ac/nir: implement 8-bit push constant, ssbo and ubo loads
ac/nir: implement 8-bit ssbo stores
ac/nir: fix 16-bit ssbo stores
ac/nir: implement 8-bit nir_load_const_instr
ac/nir: implement 8-bit conversions
ac/nir: fix 64-bit nir_op_f2f16_rtz
ac/nir: make ac_build_clamp work on all bit sizes
ac/nir: make ac_build_fract work on all bit sizes
ac/nir: make ac_build_isign work on all bit sizes
ac/nir: make ac_build_fsign work on all bit sizes
ac/nir: make ac_build_fdiv support 16-bit floats
ac/nir: implement half-float nir_op_frcp
ac/nir: implement half-float nir_op_frsq
ac/nir: implement half-float nir_op_ldexp
radv: lower 16-bit flrp
ac/nir: support half floats in emit_b2f
ac/nir: make emit_b2i work on all bit sizes
ac/nir: implement 16-bit shifts
compiler/nir: add lowering option for 16-bit ffma
ac/nir: implement 16-bit ac_build_ddxy
ac/nir: implement 8 and 16 bit ac_build_readlane
nir: make bitfield_reverse and ifind_msb work with all integers
ac/nir: make ac_find_lsb work on all bit sizes
ac/nir: make ac_build_umsb work on all bit sizes
ac/nir: implement 8 and 16 bit ac_build_imsb
ac/nir: make ac_build_bit_count work on all bit sizes
ac/nir: make ac_build_bitfield_reverse work on all bit sizes
ac/nir: implement 16-bit pack/unpack opcodes
ac/nir: add 8-bit types to glsl_base_to_llvm_type
ac/nir,radv: create an array of varying output types
ac/nir: store all outputs as f32
radv: store all fragment shader inputs as f32
radv: handle all fragment output types
WIP: radv,ac: implement 16-bit interpolation
WIP: ac,radv: run LLVM's SLP vectorizer
ac/nir: generate better code for nir_op_f2f16_rtz
ac/nir: have nir_op_f2f16 round to zero
radv,docs: expose float16, int16 and int8 features and extensions
docs/features.txt | 2 +-
src/amd/common/ac_llvm_build.c | 325 +++++++++++------------
src/amd/common/ac_llvm_build.h | 18 +-
src/amd/common/ac_llvm_util.c | 8 +-
src/amd/common/ac_nir_to_llvm.c | 268 +++++++++++++++----
src/amd/common/ac_shader_abi.h | 1 +
src/amd/vulkan/radv_device.c | 17 ++
src/amd/vulkan/radv_extensions.py | 4 +
src/amd/vulkan/radv_nir_to_llvm.c | 123 +++++----
src/amd/vulkan/radv_pipeline.c | 19 +-
src/amd/vulkan/radv_shader.c | 4 +
src/amd/vulkan/radv_shader.h | 1 +
src/broadcom/compiler/nir_to_vir.c | 1 +
src/compiler/nir/nir.h | 1 +
src/compiler/nir/nir_opcodes.py | 4 +-
src/compiler/nir/nir_opt_algebraic.py | 4 +-
src/gallium/drivers/radeonsi/si_get.c | 1 +
src/gallium/drivers/radeonsi/si_shader.c | 2 +-
src/gallium/drivers/vc4/vc4_program.c | 1 +
19 files changed, 507 insertions(+), 297 deletions(-)
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev