On 26/11/15 15:44, Francisco Jerez wrote: > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > >> On 18/11/15 06:54, Jordan Justen wrote: >>> From: Francisco Jerez <curroje...@riseup.net> >>> >>> Improves performance of the arb_shader_image_load_store-atomicity >>> piglit test by over 25x (which isn't a real benchmark it's just heavy >>> on atomics -- the improvement in a microbenchmark I wrote a while ago >>> seemed to be even greater). The drawback is one needs to be >>> extra-careful not to hang the GPU (in fact the whole system). A DC >>> partition must have been allocated on L3, the "convert L3 cycle for DC >>> to UC" bit may not be set, the MOCS L3 cacheability bit must be set >>> for all surfaces accessed using DC atomics, and the SCRATCH1 and >>> ROW_CHICKEN3 bits must be kept in sync. >>> >>> A fairly recent kernel is required for the command parser to allow >>> writes to these registers. >>> --- >>> src/mesa/drivers/dri/i965/gen7_l3_state.c | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c >>> b/src/mesa/drivers/dri/i965/gen7_l3_state.c >>> index 48bca29..c863b7f 100644 >>> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c >>> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c >>> @@ -254,5 +254,19 @@ setup_l3_config(struct brw_context *brw, const struct >>> brw_l3_config *cfg) >>> SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC)); >>> >>> ADVANCE_BATCH(); >>> + >>> + if (brw->is_haswell && brw->intelScreen->cmd_parser_version >= 4) { >>> + /* Enable L3 atomics on HSW if we have a DC partition, otherwise >>> keep >>> + * them disabled to avoid crashing the system hard. >>> + */ >>> + BEGIN_BATCH(5); >>> + OUT_BATCH(MI_LOAD_REGISTER_IMM | (5 - 2)); >>> + OUT_BATCH(HSW_SCRATCH1); >>> + OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE); >>> + OUT_BATCH(HSW_ROW_CHICKEN3); >>> + OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 | >>> + (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE)); >> >> >> I have not found references to ROW_CHICKEN3 nor register with 0xe49c >> address offset in HSW PRMs, so these could be stupid questions: >> > > Yeah, these chicken registers are not part of the public PRMs. > >> Why you need to set the L3 atomic disable flag in two different places >> in ROW_CHICKEN3 register? Also, why the first flag is set >> unconditionally while the second one only if we don't have a DC >> partition? This is what you want? > > It's a masked register as many other MMIO registers on Gen hardware: the > upper 16 bits control which of the lower 16 bits are modified in the > register. > >> >> >> Also, if the "HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16" is really >> needed, it could be defined as a constant in the first patch of the series. >> > If you think that would be more readable I guess we could define a > REG_MASK() macro and use it wherever we write a masked register. See > attachment (applies on top of this patch). >
OK, thanks for the explanation. Please add my R-b to this patch and to the attached one: Reviewed-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> Sam >> Sam >> >>> + ADVANCE_BATCH(); >>> + } >>> } >>> } >>> > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev