On Fri, Nov 04, 2016 at 05:59:29PM -0700, Jason Ekstrand wrote: > On Nov 4, 2016 2:30 AM, "Pohjolainen, Topi" > <[1]topi.pohjolai...@gmail.com> wrote: > > > > On Fri, Nov 04, 2016 at 11:17:13AM +0200, Pohjolainen, Topi wrote: > > > On Fri, Oct 28, 2016 at 02:17:12AM -0700, Jason Ekstrand wrote: > > > > Previously, blorp copy operations were CCS-unaware so you had to > perform > > > > resolves on the source and destination before performing the > copy. This > > > > commit makes blorp_copy capable of handling CCS-compressed images > without > > > > any resolves. > > > > > > > > Signed-off-by: Jason Ekstrand <[2]ja...@jlekstrand.net> > > > > --- > > > > src/intel/blorp/blorp_blit.c | 173 > ++++++++++++++++++++++++++++++++++++++++++- > > > > src/intel/blorp/blorp_priv.h | 6 ++ > > > > 2 files changed, 177 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/src/intel/blorp/blorp_blit.c > b/src/intel/blorp/blorp_blit.c > > > > index 07bb181..598b7c9 100644 > > > > --- a/src/intel/blorp/blorp_blit.c > > > > +++ b/src/intel/blorp/blorp_blit.c > > > > @@ -851,6 +851,66 @@ blorp_nir_manual_blend_bilinear(nir_builder > *b, nir_ssa_def *pos, > > > > frac_y); > > > > } > > > > > > > > +/** Perform a color bit-cast operation > > > > + * > > > > + * For copy operations involving CCS, we may need to use > different formats for > > > > + * the source and destination surfaces. The two formats must > both be UINT > > > > + * formats and must have the same size but may have different > bit layouts. > > > > + * For instance, we may be copying from R8G8B8A8_UINT to > R32_UINT or R32_UINT > > > > + * to R16G16_UINT. This function generates code to shuffle bits > around to get > > > > + * us from one to the other. > > > > + */ > > > > +static nir_ssa_def * > > > > +bit_cast_color(struct nir_builder *b, nir_ssa_def *color, > > > > + const struct brw_blorp_blit_prog_key *key) > > > > +{ > > > > + assert(key->texture_data_type == nir_type_uint); > > > > + > > > > + if (key->dst_bpc > key->src_bpc) { > > > > + nir_ssa_def *u = nir_ssa_undef(b, 1, 32); > > > > + nir_ssa_def *dst_chan[2] = { u, u }; > > > > + unsigned shift = 0; > > > > + unsigned dst_idx = 0; > > > > + for (unsigned i = 0; i < 4; i++) { > > > > + nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, > color, i), > > > > + nir_imm_int(b, > shift)); > > > > + if (shift == 0) { > > > > + dst_chan[dst_idx] = shifted; > > > > + } else { > > > > + dst_chan[dst_idx] = nir_ior(b, dst_chan[dst_idx], > shifted); > > > > + } > > > > + > > > > + shift += key->src_bpc; > > > > + if (shift >= key->dst_bpc) { > > > > + dst_idx++; > > > > + shift = 0; > > > > + } > > > > + } > > > > > > This iterates unconditionally over 4 source components regardless > how many > > > there are in source, right? Lets assume we have dst == R32_UINT and > > > src == R16G16_UINT, don't we get the following where iterations > with i = 2 > > > and i = 3 try to access non-existing z and w components: > > > > > > i = 0: > > > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, > color, 0), > > > nir_imm_int(b, 0)); > > > dst_chan[0] = shifted; > > > > > > i = 1: > > > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, > color, 1), > > > nir_imm_int(b, 16)); > > > dst_chan[0] = nir_ior(b, dst_chan[0], shifted); > > > > > > i = 2: > > > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, > color, 2), > > > nir_imm_int(b, 0)); > > > dst_chan[1] = shifted; > > > > > > i = 3: > > > nir_ssa_def *shifted = nir_ishl(b, nir_channel(b, > color, 3), > > > nir_imm_int(b, 16)); > > > dst_chan[1] = nir_ior(b, dst_chan[1], shifted); > > > > If this is intentional and the idea is to rely on optimization passes > to > > remove unnecessary writes to "dst_chan[1]" then a comment would be > nice. > > It is intentional but instead of relying on any compiler magic, we're > just relying on the fact that the texturing instruction always returns > a vec4 and the render target write always takes a vec4. I could have > included the number of components in the key and used that to not > generate pointless conversion code in a few places but that would have > ended up making it more complicated and we would end up generating more > distinct shader programs for only marginal benefit.
Ah, okay, I didn't think that. Thanks for fixing this (I can drop one of my patches): Reviewed-by: Topi Pohjolainen <topi.pohjolai...@intel.com> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev