This is labeled HACK because it relies on the validator not validating destination write-masks with respect to nir_op_infos[op].output_size and relies on the fact that the i965 vec4 hardware does an implicit splat of any dot-product results. Technically, nir_op_fdotN produces a single component that lives in the first slot and the others don't exist. However, most hardware splats dot-products so this is probably reasonable.
One solution to doing this "properly" would be to add a set of nir_op_fdotN_replicated opcodes that do the splat and somehow lower to those at the end of optimizations. I don't think we want to have the nir_op_fdot opcodes splat in SSA form because that could hurt our opportunity for CSE. However, the shader-db ressults below show that we might want to have it splat for the purposes of coalescing. Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1778849 -> 1751223 (-1.55%) instructions in affected programs: 763104 -> 735478 (-3.62%) helped: 7067 HURT: 26 It turns out that dot-products matter... Cc: Eduardo Lima Mitev <el...@igalia.com> --- src/glsl/nir/nir_lower_vec_to_movs.c | 47 ++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c index 0ebf3e3..1aa6add 100644 --- a/src/glsl/nir/nir_lower_vec_to_movs.c +++ b/src/glsl/nir/nir_lower_vec_to_movs.c @@ -84,6 +84,14 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader) return mov->dest.write_mask; } +static bool +is_fdot(nir_alu_instr *alu) +{ + return alu->op == nir_op_fdot2 || + alu->op == nir_op_fdot3 || + alu->op == nir_op_fdot4; +} + /* Attempts to coalesce the "move" from the given source of the vec to the * destination of the instruction generating the value. If, for whatever * reason, we cannot coalesce the mmove, it does nothing and returns 0. We @@ -121,19 +129,28 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader) nir_alu_instr *src_alu = nir_instr_as_alu(vec->src[start_idx].src.ssa->parent_instr); - /* We only care about being able to re-swizzle the instruction if it is - * something that we can reswizzle. It must be per-component. - */ - if (nir_op_infos[src_alu->op].output_size != 0) - return 0; - - /* If we are going to reswizzle the instruction, we can't have any - * non-per-component sources either. - */ - for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++) - if (nir_op_infos[src_alu->op].input_sizes[j] != 0) + if (is_fdot(src_alu)) { + /* The fdot instruction is special: It splats its result to all + * components. This means that we can always rewrite its destination + * and we don't need to swizzle anything. + */ + } else { + /* We only care about being able to re-swizzle the instruction if it is + * something that we can reswizzle. It must be per-component. The one + * exception to this is the fdotN instructions which implicitly splat + * their result out to all channels. + */ + if (nir_op_infos[src_alu->op].output_size != 0) return 0; + /* If we are going to reswizzle the instruction, we can't have any + * non-per-component sources either. + */ + for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++) + if (nir_op_infos[src_alu->op].input_sizes[j] != 0) + return 0; + } + /* Stash off all of the ALU instruction's swizzles. */ uint8_t swizzles[4][4]; for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++) @@ -153,8 +170,12 @@ try_coalesce(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader) * instruction so we can re-swizzle that component to match. */ write_mask |= 1 << i; - for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++) - src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]]; + if (is_fdot(src_alu)) { + /* Since fdot splats, we don't need to do any reswizzling */ + } else { + for (unsigned j = 0; j < nir_op_infos[src_alu->op].num_inputs; j++) + src_alu->src[j].swizzle[i] = swizzles[j][vec->src[i].swizzle[0]]; + } /* Clear the no longer needed vec source */ nir_instr_rewrite_src(&vec->instr, &vec->src[i].src, NIR_SRC_INIT); -- 2.5.0.400.gff86faf _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev