Kenneth Graunke <kenn...@whitecape.org> writes: > We were programming the number of threads per subslice, when we should > have been programming the total number of threads on the GPU as a whole. > > Thanks to Curro and Jordan for helping track this down! > > On Skylake GT3e: > - Improves performance in Unreal's Elemental Demo by roughly 1.5-1.7x. > - Improves performance in Synmark's Gl43CSDof by roughly 3.7x. > - Improves performance in Synmark's Gl43GSCloth by roughly 1.18x. > > On Broadwell GT2: > - Improves performance in Unreal's Elemental Demo by roughly 1.23x. > - Improves performance in Synmark's Gl43CSDof by roughly 2.0x. > - Improves performance in Synmark's Gl43GSCloth by 1.47035% +/- > 0.255654% (n=25). > > On Haswell GT3e: > - Improves performance in Unreal's Elemental Demo (in GL 4.3 mode) > by roughly 1.18x. > - Decreases performance in Gl43CSCloth by -2.88315% +/- 2.54785%? > - Gl43CSDof is still broken. > Does it work if you overallocate the scratch BO size by 128/70 on HSW? (which is roughly the amount of padding introduced by the shared function in the scratch space to account for non-existent EUs).
> Cc: "12.0" <mesa-sta...@lists.freedesktop.org> > Signed-off-by: Kenneth Graunke <kenn...@whitecape.org> > --- > src/mesa/drivers/dri/i965/brw_cs.c | 4 +++- > src/mesa/drivers/dri/i965/gen7_cs_state.c | 4 +++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > (Note that if we drop the previous patch, Haswell will be unchanged.) > > diff --git a/src/mesa/drivers/dri/i965/brw_cs.c > b/src/mesa/drivers/dri/i965/brw_cs.c > index 2a25584..c8598d6 100644 > --- a/src/mesa/drivers/dri/i965/brw_cs.c > +++ b/src/mesa/drivers/dri/i965/brw_cs.c > @@ -149,8 +149,10 @@ brw_codegen_cs_prog(struct brw_context *brw, > } > > if (prog_data.base.total_scratch) { > + const unsigned subslices = MAX2(brw->intelScreen->subslice_total, 1); > brw_get_scratch_bo(brw, &brw->cs.base.scratch_bo, > - prog_data.base.total_scratch * brw->max_cs_threads); > + prog_data.base.total_scratch * > + brw->max_cs_threads * subslices); > } > > if (unlikely(INTEL_DEBUG & DEBUG_CS)) > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c > b/src/mesa/drivers/dri/i965/gen7_cs_state.c > index aff1f4e..0eca651 100644 > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c > @@ -80,7 +80,9 @@ brw_upload_cs_state(struct brw_context *brw) > const uint32_t vfe_num_urb_entries = brw->gen >= 8 ? 2 : 0; > const uint32_t vfe_gpgpu_mode = > brw->gen == 7 ? SET_FIELD(1, GEN7_MEDIA_VFE_STATE_GPGPU_MODE) : 0; > - OUT_BATCH(SET_FIELD(brw->max_cs_threads - 1, MEDIA_VFE_STATE_MAX_THREADS) > | > + const uint32_t subslices = MAX2(brw->intelScreen->subslice_total, 1); > + OUT_BATCH(SET_FIELD(brw->max_cs_threads * subslices - 1, > + MEDIA_VFE_STATE_MAX_THREADS) | > SET_FIELD(vfe_num_urb_entries, MEDIA_VFE_STATE_URB_ENTRIES) | > SET_FIELD(1, MEDIA_VFE_STATE_RESET_GTW_TIMER) | > SET_FIELD(1, MEDIA_VFE_STATE_BYPASS_GTW) | > -- > 2.8.3 > > _______________________________________________ > mesa-stable mailing list > mesa-sta...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-stable
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev