Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > On 18/11/15 06:54, Jordan Justen wrote: >> From: Francisco Jerez <curroje...@riseup.net> >> >> Improves performance of the arb_shader_image_load_store-atomicity >> piglit test by over 25x (which isn't a real benchmark it's just heavy >> on atomics -- the improvement in a microbenchmark I wrote a while ago >> seemed to be even greater). The drawback is one needs to be >> extra-careful not to hang the GPU (in fact the whole system). A DC >> partition must have been allocated on L3, the "convert L3 cycle for DC >> to UC" bit may not be set, the MOCS L3 cacheability bit must be set >> for all surfaces accessed using DC atomics, and the SCRATCH1 and >> ROW_CHICKEN3 bits must be kept in sync. >> >> A fairly recent kernel is required for the command parser to allow >> writes to these registers. >> --- >> src/mesa/drivers/dri/i965/gen7_l3_state.c | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c >> b/src/mesa/drivers/dri/i965/gen7_l3_state.c >> index 48bca29..c863b7f 100644 >> --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c >> +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c >> @@ -254,5 +254,19 @@ setup_l3_config(struct brw_context *brw, const struct >> brw_l3_config *cfg) >> SET_FIELD(cfg->n[L3P_T], GEN7_L3CNTLREG3_T_ALLOC)); >> >> ADVANCE_BATCH(); >> + >> + if (brw->is_haswell && brw->intelScreen->cmd_parser_version >= 4) { >> + /* Enable L3 atomics on HSW if we have a DC partition, otherwise >> keep >> + * them disabled to avoid crashing the system hard. >> + */ >> + BEGIN_BATCH(5); >> + OUT_BATCH(MI_LOAD_REGISTER_IMM | (5 - 2)); >> + OUT_BATCH(HSW_SCRATCH1); >> + OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE); >> + OUT_BATCH(HSW_ROW_CHICKEN3); >> + OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 | >> + (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE)); > > > I have not found references to ROW_CHICKEN3 nor register with 0xe49c > address offset in HSW PRMs, so these could be stupid questions: >
Yeah, these chicken registers are not part of the public PRMs. > Why you need to set the L3 atomic disable flag in two different places > in ROW_CHICKEN3 register? Also, why the first flag is set > unconditionally while the second one only if we don't have a DC > partition? This is what you want? It's a masked register as many other MMIO registers on Gen hardware: the upper 16 bits control which of the lower 16 bits are modified in the register. > > > Also, if the "HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16" is really > needed, it could be defined as a constant in the first patch of the series. > If you think that would be more readable I guess we could define a REG_MASK() macro and use it wherever we write a masked register. See attachment (applies on top of this patch). > Sam > >> + ADVANCE_BATCH(); >> + } >> } >> } >>
From 4a3af3cf92bc45401242a676e0178e408eeafbb8 Mon Sep 17 00:00:00 2001 From: Francisco Jerez <curroje...@riseup.net> Date: Thu, 26 Nov 2015 16:42:43 +0200 Subject: [PATCH] i965: Define and use REG_MASK macro to make masked MMIO writes slightly more readable. --- src/mesa/drivers/dri/i965/brw_defines.h | 6 ++++++ src/mesa/drivers/dri/i965/brw_state_upload.c | 2 +- src/mesa/drivers/dri/i965/gen7_l3_state.c | 2 +- src/mesa/drivers/dri/i965/intel_reg.h | 2 +- 4 files changed, 9 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h index ade3ede..e08d385 100644 --- a/src/mesa/drivers/dri/i965/brw_defines.h +++ b/src/mesa/drivers/dri/i965/brw_defines.h @@ -41,6 +41,12 @@ #define GET_BITS(data, high, low) ((data & INTEL_MASK((high), (low))) >> (low)) #define GET_FIELD(word, field) (((word) & field ## _MASK) >> field ## _SHIFT) +/** + * For use with masked MMIO registers where the upper 16 bits control which + * of the lower bits are committed to the register. + */ +#define REG_MASK(value) ((value) << 16) + #ifndef BRW_DEFINES_H #define BRW_DEFINES_H diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c index cd1fbee..9ad40d2 100644 --- a/src/mesa/drivers/dri/i965/brw_state_upload.c +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c @@ -386,7 +386,7 @@ brw_upload_initial_gpu_state(struct brw_context *brw) BEGIN_BATCH(3); OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2)); OUT_BATCH(GEN7_CACHE_MODE_1); - OUT_BATCH((GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC << 16) | + OUT_BATCH(REG_MASK(GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC) | GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC); ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/gen7_l3_state.c b/src/mesa/drivers/dri/i965/gen7_l3_state.c index 239e5bd..022c01a 100644 --- a/src/mesa/drivers/dri/i965/gen7_l3_state.c +++ b/src/mesa/drivers/dri/i965/gen7_l3_state.c @@ -412,7 +412,7 @@ setup_l3_config(struct brw_context *brw, const struct brw_l3_config *cfg) OUT_BATCH(HSW_SCRATCH1); OUT_BATCH(has_dc ? 0 : HSW_SCRATCH1_L3_ATOMIC_DISABLE); OUT_BATCH(HSW_ROW_CHICKEN3); - OUT_BATCH(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE << 16 | + OUT_BATCH(REG_MASK(HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE) | (has_dc ? 0 : HSW_ROW_CHICKEN3_L3_ATOMIC_DISABLE)); ADVANCE_BATCH(); } diff --git a/src/mesa/drivers/dri/i965/intel_reg.h b/src/mesa/drivers/dri/i965/intel_reg.h index 0b167d5..8888d6f 100644 --- a/src/mesa/drivers/dri/i965/intel_reg.h +++ b/src/mesa/drivers/dri/i965/intel_reg.h @@ -183,7 +183,7 @@ # define GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE (1 << 13) # define GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC (1 << 1) # define GEN8_HIZ_PMA_MASK_BITS \ - ((GEN8_HIZ_NP_PMA_FIX_ENABLE | GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE) << 16) + REG_MASK(GEN8_HIZ_NP_PMA_FIX_ENABLE | GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE) /* Predicate registers */ #define MI_PREDICATE_SRC0 0x2400 -- 2.5.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev