Quoting aravindan.muthuku...@intel.com (2017-07-14 05:09:09) > From: Aravindan M <aravindan.muthuku...@intel.com> > > This patch improves CPI Rate(Cycles per Instruction) > and CPU time utilization for i965. The functions > check_state and brw_pipeline_state_finished was found > poor CPU utilization from performance analysis. > > Change-Id: I17c7e719a16e222764217a0e67b4482748537b67 > Signed-off-by: Aravindan M <aravindan.muthuku...@intel.com> > Reviewed-by: Yogesh M <yogesh.mara...@intel.com> > Tested-by: Asish <as...@intel.com> > --- > src/mesa/drivers/dri/i965/brw_defines.h | 3 +++ > src/mesa/drivers/dri/i965/brw_state_upload.c | 14 +++++++++++--- > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index a4794c6..60f88ca 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -1681,3 +1681,6 @@ enum brw_pixel_shader_coverage_mask_mode { > # define GEN8_L3CNTLREG_ALL_ALLOC_MASK INTEL_MASK(31, 25) > > #endif > + > +/* Checking the state of mesa and brw before emitting atoms */ > +#define CHECK_BRW_STATE(a,b) ((a.mesa & b.mesa) | (a.brw & b.brw)) > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c > b/src/mesa/drivers/dri/i965/brw_state_upload.c > index 5e82c1b..434decf 100644 > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c > @@ -515,7 +515,10 @@ brw_upload_pipeline_state(struct brw_context *brw, > const struct brw_tracked_state *atom = &atoms[i]; > struct brw_state_flags generated; > > - check_and_emit_atom(brw, &state, atom); > + /* Checking the state and emitting the atoms */ > + if (CHECK_BRW_STATE(state, atom->dirty)) { > + check_and_emit_atom(brw, &state, atom); > + } > > accumulate_state(&examined, &atom->dirty); > > @@ -532,7 +535,10 @@ brw_upload_pipeline_state(struct brw_context *brw, > for (i = 0; i < num_atoms; i++) { > const struct brw_tracked_state *atom = &atoms[i]; > > - check_and_emit_atom(brw, &state, atom); > + /* Checking the state and emitting the atoms */ > + if (CHECK_BRW_STATE(state, atom->dirty)) { > + check_and_emit_atom(brw, &state, atom); > + } > } > } > > @@ -567,7 +573,9 @@ brw_pipeline_state_finished(struct brw_context *brw, > brw->state.pipelines[i].mesa |= brw->NewGLState; > brw->state.pipelines[i].brw |= brw->ctx.NewDriverState; > } else { > - memset(&brw->state.pipelines[i], 0, sizeof(struct brw_state_flags)); > + /* Avoiding the memset with initialization */ > + brw->state.pipelines[i].mesa = 0; > + brw->state.pipelines[i].brw = 0ull;
Is your compiler broken? Neither inlining the simple function check_and_emit_atom, which may be a candidate for always inline instead of the manual duplication, nor converting the fixed size memset into the few inline instructions. Or are you optimising a debug build? -Chris _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev