On Fri, Feb 26, 2016 at 10:02 PM, Francisco Jerez <curroje...@riseup.net> wrote: > This simplifies the code that iterates over the per-component values > found in the matching copy_entry struct and checks whether the > register regions that were copied to each component are similar enough > to be treated as a single (reswizzled) value which can be propagated > into the current instruction. > > Aside from being scattered between opt_copy_propagation(), > try_copy_propagate(), and try_constant_propagate(), what I found > terribly confusing about the preexisting logic was that > opt_copy_propagation() tried to reorder the array of values according > to the swizzle of the instruction source, which meant one would have > had to invert the reordering applied at the top level in order to find > out which component to take from each value (we were just taking the > i-th component from the i-th value, which is not correct in general). > The saturate mask was also being swizzled incorrectly. > > This consolidates the logic for matching multiple components of a > copy_entry into a single function which returns the result as a > regular src_reg on success, as if the copy had been performed with a > single MOV instruction copying all components of the src_reg into the > destination. > > Fixes several ARB_vertex_program MOV test-cases from: > https://cgit.freedesktop.org/~kwg/piglit/log/?h=arb_program
Thanks a ton for fixing this. I tried a few times before to simply understand how the code worked -- even before I knew there was a bug -- and each time gave up. I still don't fully understand the issue. Too much or too little swizzling. Doesn't matter. Fixes the bug and is a net reduction in LOC. Nice work. A few code formatting comments below, but this patch gets an Acked-by: Matt Turner <matts...@gmail.com> > --- > .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 123 > ++++++++++----------- > 1 file changed, 57 insertions(+), 66 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > index b4a150a..fc8b721 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > @@ -85,21 +85,65 @@ is_logic_op(enum opcode opcode) > opcode == BRW_OPCODE_NOT); > } > > +/** > + * Get the origin of a copy as a single register if all components present in > + * the given readmask originate from the same register and have compatible > + * regions, otherwise return a BAD_FILE register. > + */ > +static src_reg > +get_copy_value(const copy_entry &entry, unsigned readmask) > +{ > + unsigned swz[4] = {}; > + src_reg value; > + > + for (unsigned i = 0; i < 4; i++) { > + if (readmask & (1 << i)) { > + if (entry.value[i]) { > + src_reg src = *entry.value[i]; > + > + if (src.file == IMM) { > + swz[i] = i; > + } else { > + swz[i] = BRW_GET_SWZ(src.swizzle, i); > + /* Overwrite the original swizzle so the src_reg::equals call > + * below doesn't care about it, the correct swizzle will be > + * calculated once the swizzles of all components are known. > + */ > + src.swizzle = BRW_SWIZZLE_XYZW; > + } > + > + if (value.file == BAD_FILE) > + value = src; > + > + else if (!value.equals(src)) > + return src_reg(); Parentheses required in nested if statements. With that, there's no need for the blank lines after them. > + > + } else { > + return src_reg(); > + } > + } > + } > + > + return swizzle(value, brw_compose_swizzle( > + brw_swizzle_for_mask(readmask), > + BRW_SWIZZLE4(swz[0], swz[1], swz[2], swz[3]))); Lets wrap this as return swizzle(value, brw_compose_swizzle(brw_swizzle_for_mask(readmask), BRW_SWIZZLE4(swz[0], swz[1], swz[2], swz[3]))); (GMail might mess that up, so just align each argument with the open parenthesis) > +} > + > static bool > try_constant_propagate(const struct brw_device_info *devinfo, > vec4_instruction *inst, > - int arg, struct copy_entry *entry) > + int arg, const copy_entry *entry) > { > /* For constant propagation, we only handle the same constant > * across all 4 channels. Some day, we should handle the 8-bit > * float vector format, which would let us constant propagate > * vectors better. > + * We could be more aggressive here -- some channels might not get used > + * based on the destination writemask. > */ > - src_reg value = *entry->value[0]; > - for (int i = 1; i < 4; i++) { > - if (!value.equals(*entry->value[i])) > - return false; > - } > + src_reg value = get_copy_value( > + *entry, > + brw_apply_inv_swizzle_to_mask(inst->src[arg].swizzle, WRITEMASK_XYZW)); Similarly here: src_reg value = get_copy_value(*entry, brw_apply_inv_swizzle_to_mask(inst->src[arg].swizzle, WRITEMASK_XYZW)); > > if (value.file != IMM) > return false; > @@ -238,38 +282,14 @@ try_constant_propagate(const struct brw_device_info > *devinfo, > static bool > try_copy_propagate(const struct brw_device_info *devinfo, > vec4_instruction *inst, int arg, > - struct copy_entry *entry, int attributes_per_reg) > + const copy_entry *entry, int attributes_per_reg) > { > /* Build up the value we are propagating as if it were the source of a > * single MOV > */ > - /* For constant propagation, we only handle the same constant > - * across all 4 channels. Some day, we should handle the 8-bit > - * float vector format, which would let us constant propagate > - * vectors better. > - */ > - src_reg value = *entry->value[0]; > - for (int i = 1; i < 4; i++) { > - /* This is equals() except we don't care about the swizzle. */ > - if (value.file != entry->value[i]->file || > - value.nr != entry->value[i]->nr || > - value.reg_offset != entry->value[i]->reg_offset || > - value.type != entry->value[i]->type || > - value.negate != entry->value[i]->negate || > - value.abs != entry->value[i]->abs) { > - return false; > - } > - } > - > - /* Compute the swizzle of the original register by swizzling the > - * component loaded from each value according to the swizzle of > - * operand we're going to change. > - */ > - int s[4]; > - for (int i = 0; i < 4; i++) { > - s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i); > - } > - value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]); > + src_reg value = get_copy_value( > + *entry, > + brw_apply_inv_swizzle_to_mask(inst->src[arg].swizzle, WRITEMASK_XYZW)); Same thing here. > > /* Check that we can propagate that value */ > if (value.file != UNIFORM && > @@ -418,38 +438,9 @@ vec4_visitor::opt_copy_propagation(bool do_constant_prop) > if (inst->regs_read(i) != 1) > continue; > > - int reg = (alloc.offsets[inst->src[i].nr] + > - inst->src[i].reg_offset); > - > - /* Find the regs that each swizzle component came from. > - */ > - struct copy_entry entry; > - memset(&entry, 0, sizeof(copy_entry)); > - int c; > - for (c = 0; c < 4; c++) { > - int channel = BRW_GET_SWZ(inst->src[i].swizzle, c); > - entry.value[c] = entries[reg].value[channel]; > - > - /* If there's no available copy for this channel, bail. > - * We could be more aggressive here -- some channels might > - * not get used based on the destination writemask. > - */ > - if (!entry.value[c]) > - break; > - > - entry.saturatemask |= > - (entries[reg].saturatemask & (1 << channel) ? 1 : 0) << c; > - > - /* We'll only be able to copy propagate if the sources are > - * all from the same file -- there's no ability to swizzle > - * 0 or 1 constants in with source registers like in i915. > - */ > - if (c > 0 && entry.value[c - 1]->file != entry.value[c]->file) > - break; > - } > - > - if (c != 4) > - continue; > + const unsigned reg = (alloc.offsets[inst->src[i].nr] + > + inst->src[i].reg_offset); > + const copy_entry &entry = entries[reg]; > > if (do_constant_prop && try_constant_propagate(devinfo, inst, i, > &entry)) > progress = true; > -- > 2.7.0 > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev