Kenneth, > -----Original Message----- > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf > Of Kenneth Graunke > Sent: Friday, July 14, 2017 10:05 AM > To: mesa-dev@lists.freedesktop.org > Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com> > Subject: Re: [Mesa-dev] [PATCH] i965 : Performance Improvement > > On Thursday, July 13, 2017 9:09:09 PM PDT aravindan.muthuku...@intel.com > wrote: > > From: Aravindan M <aravindan.muthuku...@intel.com> > > The commit title should be something like, "i965: Optimize atom state flag > checks" rather than a generic "Performance Improvement" > > > This patch improves CPI Rate(Cycles per Instruction) and CPU time > > utilization for i965. The functions check_state and > > brw_pipeline_state_finished was found poor CPU utilization from > > performance analysis. > > Need actual data here, or show assembly quality improvements. > > > Change-Id: I17c7e719a16e222764217a0e67b4482748537b67 > > Signed-off-by: Aravindan M <aravindan.muthuku...@intel.com> > > Reviewed-by: Yogesh M <yogesh.mara...@intel.com> > > Tested-by: Asish <as...@intel.com> > > --- > > src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ > > src/mesa/drivers/dri/i965/brw_state_upload.c | 14 +++++++++++--- > > 2 files changed, 14 insertions(+), 3 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > > b/src/mesa/drivers/dri/i965/brw_defines.h > > index a4794c6..60f88ca 100644 > > --- a/src/mesa/drivers/dri/i965/brw_defines.h > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > > @@ -1681,3 +1681,6 @@ enum brw_pixel_shader_coverage_mask_mode { > > # define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25) > > > > #endif > > + > > +/* Checking the state of mesa and brw before emitting atoms */ > > +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw)) > > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > > b/src/mesa/drivers/dri/i965/brw_state_upload.c > > index 5e82c1b..434decf 100644 > > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > > @@ -515,7 +515,10 @@ brw_upload_pipeline_state(struct brw_context *brw, > > const struct brw_tracked_state *atom = &atoms[i]; > > struct brw_state_flags generated; > > > > - check_and_emit_atom(brw, &state, atom); > > + /* Checking the state and emitting the atoms */ > > + if (CHECK_BRW_STATE(state, atom->dirty)) { > > + check_and_emit_atom(brw, &state, atom); > > + } > > > > accumulate_state(&examined, &atom->dirty); > > > > @@ -532,7 +535,10 @@ brw_upload_pipeline_state(struct brw_context *brw, > > for (i = 0; i < num_atoms; i++) { > > const struct brw_tracked_state *atom = &atoms[i]; > > > > - check_and_emit_atom(brw, &state, atom); > > + /* Checking the state and emitting the atoms */ > > + if (CHECK_BRW_STATE(state, atom->dirty)) { > > + check_and_emit_atom(brw, &state, atom); > > + } > > This doesn't make any sense...the very first thing check_and_emit_atom() does > is call check_state(), which does the exact thing your CHECK_BRW_STATE macro > does. So you're just needlessly checking the same thing twice. >
Sorry Kenneth, This is incomplete patch. The original patch that I reviewed also had if check removed as below --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -417,10 +417,8 @@ check_and_emit_atom(struct brw_context *brw, struct brw_state_flags *state, const struct brw_tracked_state *atom) { - if (check_state(state, &atom->dirty)) { atom->emit(brw); merge_ctx_state(brw, state); - } } Aravindan will push another set. > The only reason I could see this helping is if check_state() wasn't inlined, > but a > release build with -O2 definitely inlines both check_and_emit_atom() and > check_state(). > > Are you using GCC? What are your CFLAGS? -O2? I hope you're not trying to > optimize a debug build... > Yes we are using O2 and its clang on android and it's not debug. > > } > > } > > > > @@ -567,7 +573,9 @@ brw_pipeline_state_finished(struct brw_context *brw, > > brw->state.pipelines[i].mesa |= brw->NewGLState; > > brw->state.pipelines[i].brw |= brw->ctx.NewDriverState; > > } else { > > - memset(&brw->state.pipelines[i], 0, sizeof(struct > > brw_state_flags)); > > + /* Avoiding the memset with initialization */ > > + brw->state.pipelines[i].mesa = 0; > > + brw->state.pipelines[i].brw = 0ull; > > } > > } > > This is a separate change. > > I'm also not seeing why it's useful: > > Assembly before (GCC 7.1.1, x86_64, -O2 -fno-omit-frame-pointer): > > 00000000003e0380 <brw_render_state_finished>: > 3e0380: 66 0f ef c0 pxor %xmm0,%xmm0 > 3e0384: 55 push %rbp > 3e0385: 8b 87 20 52 02 00 mov 0x25220(%rdi),%eax > 3e038b: c7 87 20 52 02 00 00 movl $0x0,0x25220(%rdi) > 3e0392: 00 00 00 > 3e0395: 48 89 e5 mov %rsp,%rbp > 3e0398: 09 87 38 52 02 00 or %eax,0x25238(%rdi) > 3e039e: 48 8b 87 38 4d 02 00 mov 0x24d38(%rdi),%rax > 3e03a5: 5d pop %rbp > 3e03a6: 48 09 87 40 52 02 00 or %rax,0x25240(%rdi) > 3e03ad: 48 c7 87 38 4d 02 00 movq $0x0,0x24d38(%rdi) > 3e03b4: 00 00 00 00 > 3e03b8: 0f 11 87 28 52 02 00 movups %xmm0,0x25228(%rdi) > 3e03bf: c3 retq > > Assembly after: > > 00000000003e0380 <brw_render_state_finished>: > 3e0380: 55 push %rbp > 3e0381: 8b 87 20 52 02 00 mov 0x25220(%rdi),%eax > 3e0387: c7 87 28 52 02 00 00 movl $0x0,0x25228(%rdi) > 3e038e: 00 00 00 > 3e0391: 09 87 38 52 02 00 or %eax,0x25238(%rdi) > 3e0397: 48 89 e5 mov %rsp,%rbp > 3e039a: 48 8b 87 38 4d 02 00 mov 0x24d38(%rdi),%rax > 3e03a1: 48 c7 87 30 52 02 00 movq $0x0,0x25230(%rdi) > 3e03a8: 00 00 00 00 > 3e03ac: 48 09 87 40 52 02 00 or %rax,0x25240(%rdi) > 3e03b3: c7 87 20 52 02 00 00 movl $0x0,0x25220(%rdi) > 3e03ba: 00 00 00 > 3e03bd: 48 c7 87 38 4d 02 00 movq $0x0,0x24d38(%rdi) > 3e03c4: 00 00 00 00 > 3e03c8: 5d pop %rbp > 3e03c9: c3 retq > 3e03ca: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > > That looks like more instructions to me... > > Thanks for looking at ways to optimize our CPU overhead. I'm sure there are > some improvements we can make... _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev