From: Roland Scheidegger <srol...@vmware.com> By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but just fetch the 96bit as 3x32bit vector (this is still going to be 2 loads of course, but the loads can be done directly to simd vector that way). Also, we can now do some try to use the right int/float type. This should make no difference really since there's typically no domain transition penalties for such simd loads, however it actually makes a difference since llvm will use different shuffle lowering afterwards so the caller can use this to trick llvm into using sane shuffle afterwards (and yes llvm is really stupid there - nothing against using the shuffle instruction from the correct domain, but not at the cost of doing 3 times more shuffles, the case which actually matters is refusal to use shufps for integer values). Also do some attempt to avoid things which look great on paper but llvm doesn't really handle (e.g. fetching 3-element 8 bit and 16 bit vectors which is simply disastrous - I suspect type legalizer is to blame trying to extend these vectors to 128bit types somehow, so fetching these with scalars like before which is suboptimal due to the ZExt).
Remove the ability for truncation (no point, this is gather, not conversion) as it is complex enough already. While here also implement not just the float, but also the 64bit avx2 gathers (disabled though since based on the theoretical numbers the benefit just isn't there at all until Skylake at least). --- src/gallium/auxiliary/gallivm/lp_bld_gather.c | 42 +++++++++++++++++++++++++-- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_gather.c b/src/gallium/auxiliary/gallivm/lp_bld_gather.c index 439bbb6..1f7ba92 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_gather.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_gather.c @@ -33,6 +33,7 @@ #include "lp_bld_format.h" #include "lp_bld_gather.h" #include "lp_bld_swizzle.h" +#include "lp_bld_type.h" #include "lp_bld_init.h" #include "lp_bld_intr.h" @@ -270,17 +271,52 @@ lp_build_gather(struct gallivm_state *gallivm, LLVMTypeRef dst_elem_type = LLVMIntTypeInContext(gallivm->context, dst_width); LLVMTypeRef dst_vec_type = LLVMVectorType(dst_elem_type, length); + LLVMTypeRef gather_vec_type = dst_vec_type; unsigned i; - - res = LLVMGetUndef(dst_vec_type); + boolean vec_zext = FALSE; + unsigned gather_width = dst_width; + + + if (src_width == 16 && dst_width == 32) { + LLVMTypeRef g_elem_type = LLVMIntTypeInContext(gallivm->context, dst_width / 2); + gather_vec_type = LLVMVectorType(g_elem_type, length); + /* + * Note that llvm is never able to optimize zext/insert combos + * directly (i.e. zero the simd reg, then place the elements into + * the appropriate place directly). And 16->32bit zext simd loads + * aren't possible (instead loading to scalar reg first). + * (I think this has to do with scalar/vector transition.) + * No idea about other archs... + * We could do this manually, but instead we just use a vector + * zext, which is simple enough (and, in fact, llvm might optimize + * this away). + * (We're not trying that with other bit widths as that might not be + * easier, in particular with 8 bit values at least with only sse2.) + */ + vec_zext = TRUE; + gather_width = 16; + } + res = LLVMGetUndef(gather_vec_type); for (i = 0; i < length; ++i) { LLVMValueRef index = lp_build_const_int32(gallivm, i); LLVMValueRef elem; elem = lp_build_gather_elem(gallivm, length, - src_width, dst_width, aligned, + src_width, gather_width, aligned, base_ptr, offsets, i, vector_justify); res = LLVMBuildInsertElement(gallivm->builder, res, elem, index, ""); } + if (vec_zext) { + res = LLVMBuildZExt(gallivm->builder, res, dst_vec_type, ""); + if (vector_justify) { +#if PIPE_ARCH_BIG_ENDIAN + struct lp_type dst_type; + unsigned sv = dst_width - src_width; + dst_type = lp_type_uint_vec(dst_width, dst_width * length); + res = LLVMBuildShl(gallivm->builder, res, + lp_build_const_int_vec(gallivm, dst_type, sv), ""); +#endif + } + } } return res; -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev