On 2015-04-27 19:02:38, Kenneth Graunke wrote: > On Friday, April 24, 2015 04:33:43 PM Jordan Justen wrote: > > Tested on Ivybridge, Haswell and Broadwell. > > > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > > --- > > src/mesa/drivers/dri/i965/brw_compute.c | 39 > > ++++++++++++++++++++++++++++++++- > > src/mesa/drivers/dri/i965/brw_defines.h | 1 + > > 2 files changed, 39 insertions(+), 1 deletion(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c > > b/src/mesa/drivers/dri/i965/brw_compute.c > > index baed701..06ef448 100644 > > --- a/src/mesa/drivers/dri/i965/brw_compute.c > > +++ b/src/mesa/drivers/dri/i965/brw_compute.c > > @@ -31,12 +31,49 @@ > > #include "brw_draw.h" > > #include "brw_state.h" > > #include "intel_batchbuffer.h" > > +#include "brw_defines.h" > > > > > > static void > > brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) > > { > > - _mesa_problem(&brw->ctx, "TODO: implement brw_emit_gpgpu_walker"); > > + const struct brw_cs_prog_data *prog_data = brw->cs.prog_data; > > + > > + const unsigned simd_size = prog_data->simd_size; > > + unsigned group_size = prog_data->local_size[0] * > > + prog_data->local_size[1] * prog_data->local_size[2]; > > + unsigned thread_width_max = > > + (group_size + simd_size - 1) / simd_size; > > + > > + uint32_t right_mask = (1u << simd_size) - 1; > > + const unsigned right_non_aligned = group_size & (simd_size - 1); > > + if (right_non_aligned != 0) > > + right_mask >>= (simd_size - right_non_aligned); > > I think this is equvalent to: > > uint32_t right_mask = (1u << (simd_size - (group_size % simd_size))) - 1; > > which might be a bit simpler... > > > + BEGIN_BATCH(dwords); > > + OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2)); > > I was going to suggest splitting this into separate Gen8+ and Gen7 > blocks, but now that I look at the code...these two are slightly > different indirect handling, and the later one is just a DWord of MBZ, > so...it's not really that different. I think what you have is fine :) > > > + uint32_t dwords = brw->gen < 8 ? 11 : 15; > > + OUT_BATCH(0); > > + if (brw->gen >= 8) { > > + OUT_BATCH(0); > > + OUT_BATCH(0); > > + } > > + assert(thread_width_max <= brw->max_cs_threads); > > + OUT_BATCH(((simd_size == 8) ? 0 : 1) << 30 | > > You might want to write this as ((simd_size / 8) - 1). That will work > for SIMD8/16/32.
Good idea, but I think simd_size / 16 will be needed, since we need 2 for SIMD32. > Topi would probably suggest using SET_FIELD, i.e. > > #define BRW_GPGPU_SIMD_SIZE_SHIFT 30 > #define BRW_GPGPU_SIMD_SIZE_MASK INTEL_MASK(31, 30) > > SET_FIELD((simd_size / 8) - 1, BRW_GPGPU_SIMD_SIZE) > > It's probably a good idea here too. Will do. > > + (thread_width_max - 1)); > > Don't you need to set the thread height/depth maximums as well? > I'm not really sure how this works. We flatten the 3-dims out above in group_size, and then thread_width_max. So, this basically focuses getting it to execute the correct number of times. When height is not used, we can set bottom_mask to all 1's, and only use the right_mask. In terms of GLSL's gl_LocalInvocationID, that is a whole separate matter. (And a whole separate patch series! :) > > + OUT_BATCH(0); > > It'd be nice to label the 0's, i.e. Will do. Thanks for the review! -Jordan > OUT_BATCH(0); /* Thread Group ID Starting X */ > OUT_BATCH(num_groups[0]); /* Thread Group ID X Dimension */ > > With those changes, the whole series is: > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > > I haven't verified that these execution masks are really what you want. > You know more about this than I do. :) > > > + if (brw->gen >= 8) > > + OUT_BATCH(0); > > + OUT_BATCH(num_groups[0]); > > + OUT_BATCH(0); > > + if (brw->gen >= 8) > > + OUT_BATCH(0); > > + OUT_BATCH(num_groups[1]); > > + OUT_BATCH(0); > > + OUT_BATCH(num_groups[2]); > > + OUT_BATCH(right_mask); > > + OUT_BATCH(0xffffffff); > > + ADVANCE_BATCH(); > > } > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > > b/src/mesa/drivers/dri/i965/brw_defines.h > > index 36f46af..cd25511 100644 > > --- a/src/mesa/drivers/dri/i965/brw_defines.h > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > > @@ -2451,5 +2451,6 @@ enum brw_wm_barycentric_interp_mode { > > > > #define MEDIA_VFE_STATE 0x7000 > > #define MEDIA_INTERFACE_DESCRIPTOR_LOAD 0x7002 > > +#define GPGPU_WALKER 0x7105 > > > > #endif > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev