From: Roland Scheidegger <srol...@vmware.com> Now that there's some SoA fetch which never falls back, we should usually get results which are better or at least not worse (something like rgba32f will stay the same). I suppose though it might be worse in some cases where the format doesn't require conversion (e.g. rg32f) and goes straight to output - if llvm was able to see through all shuffles then it might have been able to do away with the aos->soa->aos transpose entirely which can no longer work possibly except for 4-channel formats (due to replacing the undef channels with 0/1 before the second transpose and not the first - llvm will definitely not be able to figure that out). That might actually be quite common, but I'm not sure llvm really could optimize it in the first place, and if it's a problem we should just special case such inputs (though note that if conversion is needed, it isn't obvious if it's better to skip the transpose or do the conversion AoS-style).
For cases which get way better, think something like R16_UNORM with 8-wide vectors: this was 8 sign-extend fetches, 8 cvt, 8 muls, followed by a couple of shuffles to stitch things together (if it is smart enough, 6 unpacks) and then a (8-wide) transpose (not sure if llvm could even optimize the shuffles + transpose, since the 16bit values were actually sign-extended to 128bit before being cast to a float vec, so that would be another 8 unpacks). Now that is just 8 fetches (directly inserted into vector, albeit there's one 128bit insert needed), 1 cvt, 1 mul. --- src/gallium/auxiliary/draw/draw_llvm.c | 54 +++++++++++++++++++++++++--------- 1 file changed, 40 insertions(+), 14 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_llvm.c b/src/gallium/auxiliary/draw/draw_llvm.c index 19b75a5..f895b76 100644 --- a/src/gallium/auxiliary/draw/draw_llvm.c +++ b/src/gallium/auxiliary/draw/draw_llvm.c @@ -755,11 +755,9 @@ fetch_vector(struct gallivm_state *gallivm, LLVMValueRef *inputs, LLVMValueRef indices) { - LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context)); LLVMBuilderRef builder = gallivm->builder; struct lp_build_context blduivec; LLVMValueRef offset, valid_mask; - LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32]; unsigned i; lp_build_context_init(&blduivec, gallivm, lp_uint_type(vs_type)); @@ -783,21 +781,49 @@ fetch_vector(struct gallivm_state *gallivm, } /* - * Note: we probably really want to use SoA fetch, not AoS one (albeit - * for most formats it will amount to the same as this isn't very - * optimized). But looks dangerous since it assumes alignment. + * Use SoA fetch. This should produce better code usually. + * Albeit it's possible there's exceptions (in particular if the fetched + * value is going directly to output if it's something like RG32F). */ - for (i = 0; i < vs_type.length; i++) { - LLVMValueRef offset1, elem; - elem = lp_build_const_int32(gallivm, i); - offset1 = LLVMBuildExtractElement(builder, offset, elem, ""); + if (1) { + struct lp_type res_type = vs_type; + /* The type handling is annoying here... */ + if (format_desc->colorspace == UTIL_FORMAT_COLORSPACE_RGB && + format_desc->channel[0].pure_integer) { + if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_SIGNED) { + res_type = lp_type_int_vec(vs_type.width, vs_type.width * vs_type.length); + } + else if (format_desc->channel[0].type == UTIL_FORMAT_TYPE_UNSIGNED) { + res_type = lp_type_uint_vec(vs_type.width, vs_type.width * vs_type.length); + } + } - aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc, - lp_float32_vec4_type(), - FALSE, map_ptr, offset1, - zero, zero, NULL); + lp_build_fetch_rgba_soa(gallivm, format_desc, + res_type, FALSE, map_ptr, offset, + blduivec.zero, blduivec.zero, + NULL, inputs); + + for (i = 0; i < TGSI_NUM_CHANNELS; i++) { + inputs[i] = LLVMBuildBitCast(builder, inputs[i], + lp_build_vec_type(gallivm, vs_type), ""); + } + + } + else { + LLVMValueRef zero = LLVMConstNull(LLVMInt32TypeInContext(gallivm->context)); + LLVMValueRef aos_fetch[LP_MAX_VECTOR_WIDTH / 32]; + for (i = 0; i < vs_type.length; i++) { + LLVMValueRef offset1, elem; + elem = lp_build_const_int32(gallivm, i); + offset1 = LLVMBuildExtractElement(builder, offset, elem, ""); + + aos_fetch[i] = lp_build_fetch_rgba_aos(gallivm, format_desc, + lp_float32_vec4_type(), + FALSE, map_ptr, offset1, + zero, zero, NULL); + } + convert_to_soa(gallivm, aos_fetch, inputs, vs_type); } - convert_to_soa(gallivm, aos_fetch, inputs, vs_type); for (i = 0; i < TGSI_NUM_CHANNELS; i++) { inputs[i] = LLVMBuildBitCast(builder, inputs[i], blduivec.vec_type, ""); -- 2.7.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev