> +LLVMValueRef > +ac_build_subgroup_inclusive_scan(struct ac_llvm_context *ctx, > + LLVMValueRef src, > + ac_reduce_op reduce, > + LLVMValueRef identity) > +{ > + /* See http://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ > + * > + * Note that each dpp/reduce pair is supposed to be compiled down to > + * one instruction by LLVM, at least for 32-bit values. > + * > + * TODO: use @llvm.amdgcn.ds.swizzle on SI and CI > + */ > + LLVMValueRef value = src; > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, src, > + dpp_row_sr(1), 0xf, 0xf, false)); > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, src, > + dpp_row_sr(2), 0xf, 0xf, false)); > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, src, > + dpp_row_sr(3), 0xf, 0xf, false)); > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, value, > + dpp_row_sr(4), 0xf, 0xe, false)); > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, value, > + dpp_row_sr(8), 0xf, 0xc, false)); > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, value, > + dpp_row_bcast15, 0xa, 0xf, false)); > + value = reduce(ctx, value, > + ac_build_dpp(ctx, identity, value, > + dpp_row_bcast31, 0xc, 0xf, false));
btw I dumped some shaders from doom on pro, it looked like it ended up with 1, 0xf, 0xf, 2, 0xf, 0xf, 4, 0xf, 0xf 8, 0xf, 0xf bcast15 0xa, 0xf bcast31 0xc, 0xf It also seems to apply these direct to instructions like /*000000002b80*/ s_nop 0x0 /*000000002b84*/ v_min_u32 v83, v83, v83 row_shr:1 bank_mask:15 row_mask:15 /*000000002b8c*/ s_nop 0x1 /*000000002b90*/ v_min_u32 v83, v83, v83 row_shr:2 bank_mask:15 row_mask:15 /*000000002b98*/ s_nop 0x1 /*000000002b9c*/ v_min_u32 v83, v83, v83 row_shr:4 bank_mask:15 row_mask:15 /*000000002ba4*/ s_nop 0x1 /*000000002ba8*/ v_min_u32 v83, v83, v83 row_shr:8 bank_mask:15 row_mask:15 /*000000002bb0*/ s_nop 0x1 /*000000002bb4*/ v_min_u32 v83, v83, v83 row_bcast15 bank_mask:15 row_mask:10 /*000000002bbc*/ s_nop 0x1 /*000000002bc0*/ v_min_u32 v83, v83, v83 row_bcast31 bank_mask:15 row_mask:12 I think the instruction combining is probably an llvm job, but I wonder if the different row_shr etc is what we should use as well. Dave. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev