On Thursday, July 13, 2017 9:49:40 PM PDT Marathe, Yogesh wrote: > Kenneth, > > > -----Original Message----- > > From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf > > Of Kenneth Graunke > > Sent: Friday, July 14, 2017 10:05 AM > > To: mesa-dev@lists.freedesktop.org > > Cc: Muthukumar, Aravindan <aravindan.muthuku...@intel.com> > > Subject: Re: [Mesa-dev] [PATCH] i965 : Performance Improvement > > > > On Thursday, July 13, 2017 9:09:09 PM PDT aravindan.muthuku...@intel.com > > wrote: > > > From: Aravindan M <aravindan.muthuku...@intel.com> > > > > The commit title should be something like, "i965: Optimize atom state flag > > checks" rather than a generic "Performance Improvement" > > > > > This patch improves CPI Rate(Cycles per Instruction) and CPU time > > > utilization for i965. The functions check_state and > > > brw_pipeline_state_finished was found poor CPU utilization from > > > performance analysis. > > > > Need actual data here, or show assembly quality improvements. > > > > > Change-Id: I17c7e719a16e222764217a0e67b4482748537b67 > > > Signed-off-by: Aravindan M <aravindan.muthuku...@intel.com> > > > Reviewed-by: Yogesh M <yogesh.mara...@intel.com> > > > Tested-by: Asish <as...@intel.com> > > > --- > > > src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ > > > src/mesa/drivers/dri/i965/brw_state_upload.c | 14 +++++++++++--- > > > 2 files changed, 14 insertions(+), 3 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > > > b/src/mesa/drivers/dri/i965/brw_defines.h > > > index a4794c6..60f88ca 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_defines.h > > > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > > > @@ -1681,3 +1681,6 @@ enum brw_pixel_shader_coverage_mask_mode { > > > # define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25) > > > > > > #endif > > > + > > > +/* Checking the state of mesa and brw before emitting atoms */ > > > +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw)) > > > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > > > b/src/mesa/drivers/dri/i965/brw_state_upload.c > > > index 5e82c1b..434decf 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > > > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > > > @@ -515,7 +515,10 @@ brw_upload_pipeline_state(struct brw_context *brw, > > > const struct brw_tracked_state *atom = &atoms[i]; > > > struct brw_state_flags generated; > > > > > > - check_and_emit_atom(brw, &state, atom); > > > + /* Checking the state and emitting the atoms */ > > > + if (CHECK_BRW_STATE(state, atom->dirty)) { > > > + check_and_emit_atom(brw, &state, atom); > > > + } > > > > > > accumulate_state(&examined, &atom->dirty); > > > > > > @@ -532,7 +535,10 @@ brw_upload_pipeline_state(struct brw_context *brw, > > > for (i = 0; i < num_atoms; i++) { > > > const struct brw_tracked_state *atom = &atoms[i]; > > > > > > - check_and_emit_atom(brw, &state, atom); > > > + /* Checking the state and emitting the atoms */ > > > + if (CHECK_BRW_STATE(state, atom->dirty)) { > > > + check_and_emit_atom(brw, &state, atom); > > > + } > > > > This doesn't make any sense...the very first thing check_and_emit_atom() > > does > > is call check_state(), which does the exact thing your CHECK_BRW_STATE macro > > does. So you're just needlessly checking the same thing twice. > > > > Sorry Kenneth, This is incomplete patch. The original patch that I reviewed > also had > if check removed as below > > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -417,10 +417,8 @@ check_and_emit_atom(struct brw_context *brw, > struct brw_state_flags *state, > const struct brw_tracked_state *atom) > { > - if (check_state(state, &atom->dirty)) { > atom->emit(brw); > merge_ctx_state(brw, state); > - } > } > > Aravindan will push another set. > > > The only reason I could see this helping is if check_state() wasn't > > inlined, but a > > release build with -O2 definitely inlines both check_and_emit_atom() and > > check_state(). > > > > Are you using GCC? What are your CFLAGS? -O2? I hope you're not trying to > > optimize a debug build... > > > > Yes we are using O2 and its clang on android and it's not debug.
Okay. I just built with Clang 4.0.1 and -O2 and both check_state and check_and_emit_atom() are inlined into the atom loop in brw_upload_pipeline_state(). So I'm still not sure how this would improve anything. > > > } > > > } > > > > > > @@ -567,7 +573,9 @@ brw_pipeline_state_finished(struct brw_context *brw, > > > brw->state.pipelines[i].mesa |= brw->NewGLState; > > > brw->state.pipelines[i].brw |= brw->ctx.NewDriverState; > > > } else { > > > - memset(&brw->state.pipelines[i], 0, sizeof(struct > > > brw_state_flags)); > > > + /* Avoiding the memset with initialization */ > > > + brw->state.pipelines[i].mesa = 0; > > > + brw->state.pipelines[i].brw = 0ull; > > > } > > > } > > Clang 4.0.1 optimizes away the memset as well.
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev