On Mon, Oct 28, 2013 at 7:00 PM, Paul Berry <stereotype...@gmail.com> wrote:
> On 28 October 2013 18:14, Paul Berry <stereotype...@gmail.com> wrote: > >> On 25 October 2013 16:45, Anuj Phogat <anuj.pho...@gmail.com> wrote: >> >>> - Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE, >>> GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification. >>> - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or >>> GEN6_WM_16_DISPATCH_ENABLE >>> when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled. >>> Refer SNB PRM Vol. 2, Part 1, Page 279 for details. >>> >>> V2: - Add a comment explaining why only SIMD8 mode is enabled with >>> MSDISPMODE_PERSAMPLE. >>> - Use shared function _mesa_get_min_invocations_per_fragment(). >>> - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask. >>> >>> Signed-off-by: Anuj Phogat <anuj.pho...@gmail.com> >>> --- >>> src/mesa/drivers/dri/i965/gen6_wm_state.c | 52 >>> +++++++++++++++++++++++++++++-- >>> 1 file changed, 49 insertions(+), 3 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> b/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> index e3395ce..25ecc11 100644 >>> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> @@ -30,6 +30,7 @@ >>> #include "brw_defines.h" >>> #include "brw_util.h" >>> #include "brw_wm.h" >>> +#include "program/program.h" >>> #include "program/prog_parameter.h" >>> #include "program/prog_statevars.h" >>> #include "intel_batchbuffer.h" >>> @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw) >>> dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT; >>> >>> /* CACHE_NEW_WM_PROG */ >>> + >>> + /* In case of non 1x (i.e 4x, 8x) multisampling with >>> MSDISPMODE_PERSAMPLE, >>> + * only one of SIMD8 and SIMD16 should be enabled. So, we have two >>> options >>> + * in above mentioned case: >>> + * 'SIMD8 only' dispatch: allowed on gen6. >>> + * 'SIMD16 only' dispatch: not allowed on gen6. >>> + * >>> + * So, we enable 'SIMD8 only' dispatch in above case. >>> >> >> I went back and re-read the docs, and I think it's fine to enable just >> SIMD8, but don't think it's correct to say that "SIMD16 only" is not >> allowed on gen6. >> >> The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages >> > Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size) >> Control" shows: >> >> SIMD8 only with validity criterion A >> SIMD16 only with validity criterion B >> SIMD8+SIMD16 with validity criterion D >> ...skipping some irrelevant ones... >> SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't >> support SIMD32) >> ...skipping some more irrelevant ones... >> >> (note: the SNB and IVB PRMs show SIMD16 only with validity criterion F, >> but the end result is the same.) >> >> In the SNB and IVB PRM, Criteria A-F are defined right next to the >> table. In the current bspec, they're defined in 3DSTATE_WM and >> 3DSTATE_PS. They say: >> >> A: Valid on all products. >> B: Valid on [DevCTG+]. Not valid on [DevSNB] if 4x PERPIXEL mode with >> pixel shader computed depth. >> C: Valid only on [DevCTG] and [DevIL] >> D: Valid on all products, except when in non-1x PERSAMPLE mode (applies >> to [DevSNB] only). Not valid on [DevSNB] if 4x PERPIXEL mode with pixel >> shader computed depth. >> E: Valid on all products, except when in PERSAMPLE mode with number of >> multisamples >= 8 (applies to [DevIVB+] only). Not valid on [DevSNB] if 4x >> PERPIXEL mode with pixel shader computed depth. >> F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL >> mode with pixel shader computed depth. >> >> >> So the upshot of all this is that in non-1x PERSAMPLE mode: >> >> - On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one >> or the other. >> - On gen7, we can enable either SIMD8 or SIMD16 or both. >> >> (previously, we thought that 8x MSAA had to be SIMD8 only, but I believe >> that was a misunderstanding. The restriction on 8x MSAA is only in >> validity criterion E, and that criterion only takes effect if we're doing >> SIMD16+SIMD32, which we never do.) >> >> Please correct me if my logic is wrong. I want to make sure we get to >> the bottom of this. >> >> Assuming your analysis agrees with mine, then the only reason we aren't >> doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a >> chance to get it working yet. At a minimum, we need to update the comment >> to reflect that. It might be worth adding some of the above analysis to >> the comment so that the next person to work on this code doesn't get as >> confused as we did. >> > > Two other notes about this: > > 1. Currently we aren't respecting the restrictions that say "Not valid on > [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth." I'm happy > to take care of this once Anuj's series lands. > > 2. Once we've verified that SIMD16 code generation works properly on Gen7 > for gl_SampleID, gl_SamplePosition, and gl_SampleMask[], I believe we > should switch our Sandy Bridge logic so that in non-1x PERSAMPLE mode, we > do "SIMD16 only" dispatch if a SIMD16 shader was successfully compiled. I > believe that in the vast majority of cases that will be more performant > than "SIMD8 only" dispatch. > I agree. This applies to both gen6 and gen7. gl_SampleMask[] doesn't require MSDISPMODE_PERSAMPLE. So, both SIMD8 and SIMD16 are already enabled in case of gl_SampleMask[]. > >> >>> + */ >>> dw5 |= GEN6_WM_8_DISPATCH_ENABLE; >>> - if (brw->wm.prog_data->prog_offset_16) >>> + >>> + if (brw->wm.prog_data->prog_offset_16 && >>> + !(_mesa_get_min_invocations_per_fragment(ctx, >>> + >>> brw->fragment_program) > 1)) >>> >> >> This "if" statement is overly indented. Also, >> !(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing. Let's >> just do _mesa_get_min_invocations_per_fragment(...) == 1. >> >> (Note: this is assuming you agreed with me that we need to add "MAX2" in >> patch 4. If we don't do that then "== 1" needs to change to "<= 1".) >> >> >>> dw5 |= GEN6_WM_16_DISPATCH_ENABLE; >>> >>> /* CACHE_NEW_WM_PROG | _NEW_COLOR */ >>> @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw) >>> >>> /* _NEW_COLOR, _NEW_MULTISAMPLE */ >>> if (fp->program.UsesKill || ctx->Color.AlphaEnabled || >>> - ctx->Multisample.SampleAlphaToCoverage) >>> + ctx->Multisample.SampleAlphaToCoverage || >>> + brw->wm.prog_data->uses_omask) >>> dw5 |= GEN6_WM_KILL_ENABLE; >>> >>> if (brw_color_buffer_write_enabled(brw) || >>> @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw) >>> dw5 |= GEN6_WM_DISPATCH_ENABLE; >>> } >>> >>> + /* From the SNB PRM, volume 2 part 1, page 278: >>> + * "This bit is inserted in the PS payload header and made available >>> to >>> + * the DataPort (either via the message header or via header bypass) >>> to >>> + * indicate that oMask data (one or two phases) is included in Render >>> + * Target Write messages. If present, the oMask data is used to mask >>> off >>> + * samples." >>> + */ >>> + if(brw->wm.prog_data->uses_omask) >>> + dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET; >>> + >>> /* CACHE_NEW_WM_PROG */ >>> dw6 |= brw->wm.prog_data->num_varying_inputs << >>> GEN6_WM_NUM_SF_OUTPUTS_SHIFT; >>> @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw) >>> dw6 |= GEN6_WM_MSRAST_ON_PATTERN; >>> else >>> dw6 |= GEN6_WM_MSRAST_OFF_PIXEL; >>> - dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL; >>> + >>> + if (_mesa_get_min_invocations_per_fragment(ctx, >>> brw->fragment_program) > 1) >>> + dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE; >>> + else >>> + dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL; >>> } else { >>> dw6 |= GEN6_WM_MSRAST_OFF_PIXEL; >>> dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE; >>> } >>> >>> + /* _NEW_BUFFERS, _NEW_MULTISAMPLE */ >>> + /* From the SNB PRM, volume 2 part 1, page 281: >>> + * "If the PS kernel does not need the Position XY Offsets >>> + * to compute a Position XY value, then this field should be >>> + * programmed to POSOFFSET_NONE." >>> + * >>> + * "SW Recommendation: If the PS kernel needs the Position Offsets >>> + * to compute a Position XY value, this field should match Position >>> + * ZW Interpolation Mode to ensure a consistent position.xyzw >>> + * computation." >>> + * We only require XY sample offsets. So, this recommendation doesn't >>> + * look useful at the moment. We might need this in future. >>> + */ >>> + if (brw->wm.prog_data->uses_pos_offset) >>> + dw6 |= GEN6_WM_POSOFFSET_SAMPLE; >>> + else >>> + dw6 |= GEN6_WM_POSOFFSET_NONE; >>> + >>> BEGIN_BATCH(9); >>> OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2)); >>> OUT_BATCH(brw->wm.base.prog_offset); >>> -- >>> 1.8.1.4 >>> >>> >> With those changes, the patch is: >> >> Reviewed-by: Paul Berry <stereotype...@gmail.com> >> > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev