On 12/21/2014 09:11 PM, Kenneth Graunke wrote: > On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote: >> From: Ian Romanick <ian.d.roman...@intel.com> >> >> Instead of having an extra pointer indirection in one of the hottest >> loops in the driver. >> >> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic >> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects >> Gl32Batch7: >> >> 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) >> 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) >> >> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >> --- >> src/mesa/drivers/dri/i965/brw_context.h | 2 +- >> src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ++++++++++++++++--- >> 2 files changed, 17 insertions(+), 4 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_context.h >> b/src/mesa/drivers/dri/i965/brw_context.h >> index 92eb022..5f5f807 100644 >> --- a/src/mesa/drivers/dri/i965/brw_context.h >> +++ b/src/mesa/drivers/dri/i965/brw_context.h >> @@ -1384,7 +1384,7 @@ struct brw_context >> } perfmon; >> >> int num_atoms; >> - const struct brw_tracked_state **atoms; >> + const struct brw_tracked_state atoms[64]; > > Making this atoms[57] would save memory, and the static asserts make it easy > to remember to increase the size if we ever add additional atoms.
I can change that easily enough. > Honestly, it seems like there should be a better way to do this. Currently, > the atoms list only depends on the generation number, and thus is identical > for all GL contexts. Storing an additional 1.5 kB of redundant data in each > GL context seems a bit wasteful. That's an interesting idea that hadn't occurred to me. :) > Would it help to make brw_upload_state cache the array and size? > > void > brw_upload_state(struct brw_context *brw) > { > static const struct brw_tracked_state *atoms = brw->atoms; > static const int num_atoms = brw->num_atoms; > > ... > > for (i = 0; i < num_atoms; i++) { > ...use atoms[i]... > } > > ... > } > > It seems like that would achieve the same effect - one indirection. > > One downside to that approach is that we couldn't have separate atom > lists for API_CORE vs. API_COMPAT (while we could using your approach). > (We could skip stippling for core contexts, and skip GS/HS/DS for legacy > contexts, which might help a tiny bit...) > >> /* If (INTEL_DEBUG & DEBUG_BATCH) */ >> struct { >> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c >> b/src/mesa/drivers/dri/i965/brw_state_upload.c >> index a579781..9a5d49a 100644 >> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c >> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c >> @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw ) >> const struct brw_tracked_state **atoms; >> int num_atoms; >> >> + STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms)); >> + STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms)); >> + STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms)); >> + STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms)); > > I do like the static asserts. > >> + >> brw_init_caches(brw); >> >> if (brw->gen >= 8) { >> @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw ) >> num_atoms = ARRAY_SIZE(gen4_atoms); >> } >> >> - brw->atoms = atoms; >> brw->num_atoms = num_atoms; >> >> + /* This is to work around brw_context::atoms being declared const. We >> want >> + * it to be const, but it needs to be initialized somehow! >> + */ >> + struct brw_tracked_state *context_atoms = >> + (struct brw_tracked_state *) &brw->atoms[0]; > > Ugly, but... not sure what else to suggest. I was going to s/context_atoms/brw->atoms[i]/, but I wanted the patch to be easier to review. There was also a follow-on patch that uses some SSE3 instructions in the loop, but it was giving mixed results. I'm holding that back to send out later. > I suppose this patch is fine. It does offer a small CPU overhead reduction in > the hottest path, for a small memory usage penalty. I think we can do better, > but I don't know that it makes sense to block your patch over nebulous ideas > for future improvements. We can do those in the future... > > Reviewed-by: Kenneth Graunke <kenn...@whitecape.org> > >> + >> + for (int i = 0; i < num_atoms; i++) >> + context_atoms[i] = *atoms[i]; >> + >> while (num_atoms--) { >> assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw); >> assert((*atoms)->emit); >> @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw) >> prev = *state; >> >> for (i = 0; i < brw->num_atoms; i++) { >> - const struct brw_tracked_state *atom = brw->atoms[i]; >> + const struct brw_tracked_state *atom = &brw->atoms[i]; >> struct brw_state_flags generated; >> >> if (check_state(state, &atom->dirty)) { >> @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw) >> } >> else { >> for (i = 0; i < brw->num_atoms; i++) { >> - const struct brw_tracked_state *atom = brw->atoms[i]; >> + const struct brw_tracked_state *atom = &brw->atoms[i]; >> >> if (check_state(state, &atom->dirty)) { >> atom->emit(brw);
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev