On 12/21/2014 09:11 PM, Kenneth Graunke wrote: > On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote: >> From: Ian Romanick <[email protected]> >> >> Instead of having an extra pointer indirection in one of the hottest >> loops in the driver. >> >> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic >> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects >> Gl32Batch7: >> >> 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40) >> 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60) >> >> Signed-off-by: Ian Romanick <[email protected]> >> --- >> src/mesa/drivers/dri/i965/brw_context.h | 2 +- >> src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ++++++++++++++++--- >> 2 files changed, 17 insertions(+), 4 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_context.h >> b/src/mesa/drivers/dri/i965/brw_context.h >> index 92eb022..5f5f807 100644 >> --- a/src/mesa/drivers/dri/i965/brw_context.h >> +++ b/src/mesa/drivers/dri/i965/brw_context.h >> @@ -1384,7 +1384,7 @@ struct brw_context >> } perfmon; >> >> int num_atoms; >> - const struct brw_tracked_state **atoms; >> + const struct brw_tracked_state atoms[64]; > > Making this atoms[57] would save memory, and the static asserts make it easy > to remember to increase the size if we ever add additional atoms.
I can change that easily enough.
> Honestly, it seems like there should be a better way to do this. Currently,
> the atoms list only depends on the generation number, and thus is identical
> for all GL contexts. Storing an additional 1.5 kB of redundant data in each
> GL context seems a bit wasteful.
That's an interesting idea that hadn't occurred to me. :)
> Would it help to make brw_upload_state cache the array and size?
>
> void
> brw_upload_state(struct brw_context *brw)
> {
> static const struct brw_tracked_state *atoms = brw->atoms;
> static const int num_atoms = brw->num_atoms;
>
> ...
>
> for (i = 0; i < num_atoms; i++) {
> ...use atoms[i]...
> }
>
> ...
> }
>
> It seems like that would achieve the same effect - one indirection.
>
> One downside to that approach is that we couldn't have separate atom
> lists for API_CORE vs. API_COMPAT (while we could using your approach).
> (We could skip stippling for core contexts, and skip GS/HS/DS for legacy
> contexts, which might help a tiny bit...)
>
>> /* If (INTEL_DEBUG & DEBUG_BATCH) */
>> struct {
>> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> index a579781..9a5d49a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw )
>> const struct brw_tracked_state **atoms;
>> int num_atoms;
>>
>> + STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms));
>> + STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms));
>> + STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms));
>> + STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms));
>
> I do like the static asserts.
>
>> +
>> brw_init_caches(brw);
>>
>> if (brw->gen >= 8) {
>> @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw )
>> num_atoms = ARRAY_SIZE(gen4_atoms);
>> }
>>
>> - brw->atoms = atoms;
>> brw->num_atoms = num_atoms;
>>
>> + /* This is to work around brw_context::atoms being declared const. We
>> want
>> + * it to be const, but it needs to be initialized somehow!
>> + */
>> + struct brw_tracked_state *context_atoms =
>> + (struct brw_tracked_state *) &brw->atoms[0];
>
> Ugly, but... not sure what else to suggest.
I was going to s/context_atoms/brw->atoms[i]/, but I wanted the patch to
be easier to review. There was also a follow-on patch that uses some
SSE3 instructions in the loop, but it was giving mixed results. I'm
holding that back to send out later.
> I suppose this patch is fine. It does offer a small CPU overhead reduction in
> the hottest path, for a small memory usage penalty. I think we can do better,
> but I don't know that it makes sense to block your patch over nebulous ideas
> for future improvements. We can do those in the future...
>
> Reviewed-by: Kenneth Graunke <[email protected]>
>
>> +
>> + for (int i = 0; i < num_atoms; i++)
>> + context_atoms[i] = *atoms[i];
>> +
>> while (num_atoms--) {
>> assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw);
>> assert((*atoms)->emit);
>> @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw)
>> prev = *state;
>>
>> for (i = 0; i < brw->num_atoms; i++) {
>> - const struct brw_tracked_state *atom = brw->atoms[i];
>> + const struct brw_tracked_state *atom = &brw->atoms[i];
>> struct brw_state_flags generated;
>>
>> if (check_state(state, &atom->dirty)) {
>> @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw)
>> }
>> else {
>> for (i = 0; i < brw->num_atoms; i++) {
>> - const struct brw_tracked_state *atom = brw->atoms[i];
>> + const struct brw_tracked_state *atom = &brw->atoms[i];
>>
>> if (check_state(state, &atom->dirty)) {
>> atom->emit(brw);
signature.asc
Description: OpenPGP digital signature
_______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
