Thanks for adding the comments. Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com>
On Thu, Aug 6, 2015 at 7:19 AM, Francisco Jerez <curroje...@riseup.net> wrote: > Francisco Jerez <curroje...@riseup.net> writes: > >> "Pohjolainen, Topi" <topi.pohjolai...@intel.com> writes: >> >>> On Wed, Aug 05, 2015 at 12:11:02PM +0300, Pohjolainen, Topi wrote: >>>> On Mon, Jul 20, 2015 at 07:17:48PM +0300, Francisco Jerez wrote: >>>> > This will be used to pass image meta-data to the shader when we cannot >>>> > use typed surface reads and writes. All entries except surface_idx >>>> > and size are otherwise unused and will get eliminated by the uniform >>>> > packing pass. size will be used for bounds checking with some image >>>> > formats and will be useful for ARB_shader_image_size too. surface_idx >>>> > is always used. >>>> > >>>> > v2: Add CS support. Move the image_params array back to >>>> > brw_stage_prog_data. >>>> > --- >>>> > I'm resending this (and also patches 9 and 10) because I had to make >>>> > some rather intrusive changes during one of my last rebases -- The >>>> > image_param array is now part of brw_stage_prog_data again instead of >>>> > brw_stage_state (ironically as it was in my very first submission of >>>> > these patches) because the compiler no longer has access to >>>> > brw_stage_state since the brw_context pointer was removed from the >>>> > visitors. >>>> > >>>> > src/mesa/drivers/dri/i965/brw_context.h | 54 ++++++++++++++++ >>>> > src/mesa/drivers/dri/i965/brw_cs.cpp | 3 + >>>> > src/mesa/drivers/dri/i965/brw_gs.c | 3 + >>>> > src/mesa/drivers/dri/i965/brw_vs.c | 5 +- >>>> > src/mesa/drivers/dri/i965/brw_wm.c | 4 ++ >>>> > src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 82 >>>> > ++++++++++++++++++++++++ >>>> > 6 files changed, 150 insertions(+), 1 deletion(-) >>>> > >>>> > diff --git a/src/mesa/drivers/dri/i965/brw_context.h >>>> > b/src/mesa/drivers/dri/i965/brw_context.h >>>> > index e16ad10..9ebad5b 100644 >>>> > --- a/src/mesa/drivers/dri/i965/brw_context.h >>>> > +++ b/src/mesa/drivers/dri/i965/brw_context.h >>>> > @@ -361,6 +361,7 @@ struct brw_stage_prog_data { >>>> > >>>> > GLuint nr_params; /**< number of float params/constants */ >>>> > GLuint nr_pull_params; >>>> > + unsigned nr_image_params; >>>> > >>>> > unsigned curb_read_length; >>>> > unsigned total_scratch; >>>> > @@ -381,6 +382,59 @@ struct brw_stage_prog_data { >>>> > */ >>>> > const gl_constant_value **param; >>>> > const gl_constant_value **pull_param; >>>> > + >>>> > + /** >>>> > + * Image metadata passed to the shader as uniforms. This is >>>> > deliberately >>>> > + * ignored by brw_stage_prog_data_compare() because its contents >>>> > don't have >>>> > + * any influence on program compilation. >>>> > + */ >>>> > + struct brw_image_param *image_param; >>>> > +}; >>>> > + >>>> > +/* >>>> > + * Image metadata structure as laid out in the shader parameter >>>> > + * buffer. Entries have to be 16B-aligned for the vec4 back-end to be >>>> > + * able to use them. That's okay because the padding and any unused >>>> > + * entries [most of them except when we're doing untyped surface >>>> > + * access] will be removed by the uniform packing pass. >>>> > + */ >>>> > +#define BRW_IMAGE_PARAM_SURFACE_IDX_OFFSET 0 >>>> > +#define BRW_IMAGE_PARAM_OFFSET_OFFSET 4 >>>> > +#define BRW_IMAGE_PARAM_SIZE_OFFSET 8 >>>> > +#define BRW_IMAGE_PARAM_STRIDE_OFFSET 12 >>>> > +#define BRW_IMAGE_PARAM_TILING_OFFSET 16 >>>> > +#define BRW_IMAGE_PARAM_SWIZZLING_OFFSET 20 >>>> > +#define BRW_IMAGE_PARAM_SIZE 24 >>>> > + >>>> > +struct brw_image_param { >>>> > + /** Surface binding table index. */ >>>> > + uint32_t surface_idx; >>>> > + >>>> > + /** Surface X, Y and Z dimensions. */ >>>> > + uint32_t size[3]; >>>> >>>> Like I mentioned in one of the subsequent patches, it would clearer if >>>> "size" would follow "offset". That way it matches the order of >>>> BRW_IMAGE_PARAM_*_OFFSET defines and later on the layout in the uniform >>>> buffer object. >>>> >>>> > + >>>> > + /** Offset applied to the X and Y surface coordinates. */ >>>> > + uint32_t offset[2]; >>>> > + >>>> > + /** X-stride in bytes, Y-stride in bytes, horizontal slice stride in >>>> > + * pixels, vertical slice stride in pixels. >>>> > + */ >>>> > + uint32_t stride[4]; >>>> > + >>>> > + /** Log2 of the tiling modulus in the X, Y and Z dimension. */ >>>> > + uint32_t tiling[3]; >>>> > + >>>> > + /** >>>> > + * Right shift to apply for bit 6 address swizzling. Two different >>>> > + * swizzles can be specified and will be applied one after the >>>> > other. The >>>> > + * resulting address will be: >>>> > + * >>>> > + * addr' = addr ^ ((1 << 6) & ((addr >> swizzling[0]) ^ >>>> > + * (addr >> swizzling[1]))) >>>> > + * >>>> > + * Use \c 0xff if any of the swizzles is not required. >>>> > + */ >>>> > + uint32_t swizzling[2]; >>>> > }; >>>> > >>>> > /* Data about a particular attempt to compile a program. Note that >>>> > diff --git a/src/mesa/drivers/dri/i965/brw_cs.cpp >>>> > b/src/mesa/drivers/dri/i965/brw_cs.cpp >>>> > index d61bba0..144aa27 100644 >>>> > --- a/src/mesa/drivers/dri/i965/brw_cs.cpp >>>> > +++ b/src/mesa/drivers/dri/i965/brw_cs.cpp >>>> > @@ -190,7 +190,10 @@ brw_codegen_cs_prog(struct brw_context *brw, >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > prog_data.base.pull_param = >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > + prog_data.base.image_param = >>>> > + rzalloc_array(NULL, struct brw_image_param, cs->NumImages); >>>> > prog_data.base.nr_params = param_count; >>>> > + prog_data.base.nr_image_params = cs->NumImages; >>>> > >>>> > program = brw_cs_emit(brw, mem_ctx, key, &prog_data, >>>> > &cp->program, prog, &program_size); >>>> > diff --git a/src/mesa/drivers/dri/i965/brw_gs.c >>>> > b/src/mesa/drivers/dri/i965/brw_gs.c >>>> > index 9c59c8a..d1a955a 100644 >>>> > --- a/src/mesa/drivers/dri/i965/brw_gs.c >>>> > +++ b/src/mesa/drivers/dri/i965/brw_gs.c >>>> > @@ -69,7 +69,10 @@ brw_codegen_gs_prog(struct brw_context *brw, >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > c.prog_data.base.base.pull_param = >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > + c.prog_data.base.base.image_param = >>>> > + rzalloc_array(NULL, struct brw_image_param, gs->NumImages); >>>> > c.prog_data.base.base.nr_params = param_count; >>>> > + c.prog_data.base.base.nr_image_params = gs->NumImages; >>>> > >>>> > if (brw->gen >= 7) { >>>> > if (gp->program.OutputType == GL_POINTS) { >>>> > diff --git a/src/mesa/drivers/dri/i965/brw_vs.c >>>> > b/src/mesa/drivers/dri/i965/brw_vs.c >>>> > index 2b9b005..20bc7a9 100644 >>>> > --- a/src/mesa/drivers/dri/i965/brw_vs.c >>>> > +++ b/src/mesa/drivers/dri/i965/brw_vs.c >>>> > @@ -122,7 +122,7 @@ brw_codegen_vs_prog(struct brw_context *brw, >>>> > * conservative here. >>>> > */ >>>> > param_count = vs->num_uniform_components * 4; >>>> > - >>>> > + stage_prog_data->nr_image_params = vs->NumImages; >>>> > } else { >>>> > param_count = vp->program.Base.Parameters->NumParameters * 4; >>>> > } >>>> > @@ -135,6 +135,9 @@ brw_codegen_vs_prog(struct brw_context *brw, >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > stage_prog_data->pull_param = >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > + stage_prog_data->image_param = >>>> > + rzalloc_array(NULL, struct brw_image_param, >>>> > + stage_prog_data->nr_image_params); >>>> > stage_prog_data->nr_params = param_count; >>>> > >>>> > GLbitfield64 outputs_written = vp->program.Base.OutputsWritten; >>>> > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c >>>> > b/src/mesa/drivers/dri/i965/brw_wm.c >>>> > index b590b17..e0e0bb7 100644 >>>> > --- a/src/mesa/drivers/dri/i965/brw_wm.c >>>> > +++ b/src/mesa/drivers/dri/i965/brw_wm.c >>>> > @@ -196,6 +196,7 @@ brw_codegen_wm_prog(struct brw_context *brw, >>>> > int param_count; >>>> > if (fs) { >>>> > param_count = fs->num_uniform_components; >>>> > + prog_data.base.nr_image_params = fs->NumImages; >>>> > } else { >>>> > param_count = fp->program.Base.Parameters->NumParameters * 4; >>>> > } >>>> > @@ -205,6 +206,9 @@ brw_codegen_wm_prog(struct brw_context *brw, >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > prog_data.base.pull_param = >>>> > rzalloc_array(NULL, const gl_constant_value *, param_count); >>>> > + prog_data.base.image_param = >>>> > + rzalloc_array(NULL, struct brw_image_param, >>>> > + prog_data.base.nr_image_params); >>>> > prog_data.base.nr_params = param_count; >>>> > >>>> > prog_data.barycentric_interp_modes = >>>> > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >>>> > b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >>>> > index 33e045f..cc09c04 100644 >>>> > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >>>> > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c >>>> > @@ -1045,6 +1045,83 @@ get_image_format(struct brw_context *brw, >>>> > mesa_format format, GLenum access) >>>> > } >>>> > >>>> > static void >>>> > +update_default_image_param(struct brw_context *brw, >>>> > + struct gl_image_unit *u, >>>> > + unsigned surface_idx, >>>> > + struct brw_image_param *param) >>>> > +{ >>>> > + memset(param, 0, sizeof(*param)); >>>> > + param->surface_idx = surface_idx; >>>> > + param->swizzling[0] = 0xff; >>>> > + param->swizzling[1] = 0xff; >>>> > +} >>>> > + >>>> > +static void >>>> > +update_buffer_image_param(struct brw_context *brw, >>>> > + struct gl_image_unit *u, >>>> > + unsigned surface_idx, >>>> > + struct brw_image_param *param) >>>> > +{ >>>> > + struct gl_buffer_object *obj = u->TexObj->BufferObject; >>>> > + >>>> > + update_default_image_param(brw, u, surface_idx, param); >>>> > + >>>> > + param->size[0] = obj->Size / >>>> > _mesa_get_format_bytes(u->_ActualFormat); >>>> > + param->stride[0] = _mesa_get_format_bytes(u->_ActualFormat); >>>> > +} >>>> > + >>>> > +static void >>>> > +update_texture_image_param(struct brw_context *brw, >>>> > + struct gl_image_unit *u, >>>> > + unsigned surface_idx, >>>> > + struct brw_image_param *param) >>>> > +{ >>>> > + struct intel_mipmap_tree *mt = intel_texture_object(u->TexObj)->mt; >>>> > + >>>> > + update_default_image_param(brw, u, surface_idx, param); >>>> > + >>>> > + param->size[0] = minify(mt->logical_width0, u->Level); >>>> > + param->size[1] = minify(mt->logical_height0, u->Level); >>>> > + param->size[2] = (!u->Layered ? 1 : >>>> > + u->TexObj->Target == GL_TEXTURE_CUBE_MAP ? 6 : >>>> > + u->TexObj->Target == GL_TEXTURE_3D ? >>>> > + minify(mt->logical_depth0, u->Level) : >>>> > + mt->logical_depth0); >>>> > + >>>> > + intel_miptree_get_image_offset(mt, u->Level, u->Layer, >>>> > + ¶m->offset[0], >>>> > + ¶m->offset[1]); >>>> > + >>>> > + param->stride[0] = mt->cpp; >>>> > + param->stride[1] = mt->pitch / mt->cpp; >>>> > + param->stride[2] = >>>> > + brw_miptree_get_horizontal_slice_pitch(brw, mt, u->Level); >>>> > + param->stride[3] = >>>> > + brw_miptree_get_vertical_slice_pitch(brw, mt, u->Level); >>>> > + >>>> > + if (mt->tiling == I915_TILING_X) { >>>> > + param->tiling[0] = ffs(512 / mt->cpp) - 1; >>>> > + param->tiling[1] = 3; >>>> >>>> Would it be clearer to use ffs(8) - 1 here also? Or add a comment >>>> >>>> param->tiling[1] = 3; /* log2(8) */ >>>> >>>> And the same further down for the vertical Y-tiling dimension. >>>> >>>> > + >>>> > + if (brw->has_swizzling) { >>>> > + param->swizzling[0] = 3; >>>> > + param->swizzling[1] = 4; >>>> > + } >>>> > + } else if (mt->tiling == I915_TILING_Y) { >>>> > + param->tiling[0] = ffs(16 / mt->cpp) - 1; >>>> >>>> Above you use the full tile width of 512 (in bytes) for X-tile, but here >>>> only 16 for Y-tile instead of the 128. I don't really understand this. >>>> >>>> > + param->tiling[1] = 5; >>>> > + >>>> > + if (brw->has_swizzling) >>>> > + param->swizzling[0] = 3; >>>> > + } >>>> > + >>>> > + if (u->TexObj->Target == GL_TEXTURE_3D) >>>> > + param->tiling[2] = u->Level; >>>> > + else >>>> > + param->tiling[2] = 0; >>>> >>>> Here you set the value directly to the z-dimension but in the documentation >>>> claims that all three tiling values are log2. >>> >>> And looking further emit_address_calculation() I now understand the >>> documentation - this is log2 of the number of slices in particular level. >>> >>> This patch is: >>> >>> Reviewed-by: Topi Pohjolainen <topi.pohjolai...@intel.com> >> >> Thanks! > > I've added a couple of comments more to this function for the case > someone asks himself the same questions in the future, and replaced the > ffs() calculations with equivalent calls to _mesa_logbase2 for clarity, > see [1]. > > [1] > http://cgit.freedesktop.org/~currojerez/mesa/commit/?h=image-load-store-lower&id=a002ca6f0c8ccd7e687ff9a7928515278f2bf3e6 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev