On 28 October 2013 19:50, Anuj Phogat <anuj.pho...@gmail.com> wrote: > > > > On Mon, Oct 28, 2013 at 6:14 PM, Paul Berry <stereotype...@gmail.com>wrote: > >> On 25 October 2013 16:45, Anuj Phogat <anuj.pho...@gmail.com> wrote: >> >>> - Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE, >>> GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification. >>> - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or >>> GEN6_WM_16_DISPATCH_ENABLE >>> when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled. >>> Refer SNB PRM Vol. 2, Part 1, Page 279 for details. >>> >>> V2: - Add a comment explaining why only SIMD8 mode is enabled with >>> MSDISPMODE_PERSAMPLE. >>> - Use shared function _mesa_get_min_invocations_per_fragment(). >>> - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask. >>> >>> Signed-off-by: Anuj Phogat <anuj.pho...@gmail.com> >>> --- >>> src/mesa/drivers/dri/i965/gen6_wm_state.c | 52 >>> +++++++++++++++++++++++++++++-- >>> 1 file changed, 49 insertions(+), 3 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> b/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> index e3395ce..25ecc11 100644 >>> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c >>> @@ -30,6 +30,7 @@ >>> #include "brw_defines.h" >>> #include "brw_util.h" >>> #include "brw_wm.h" >>> +#include "program/program.h" >>> #include "program/prog_parameter.h" >>> #include "program/prog_statevars.h" >>> #include "intel_batchbuffer.h" >>> @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw) >>> dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT; >>> >>> /* CACHE_NEW_WM_PROG */ >>> + >>> + /* In case of non 1x (i.e 4x, 8x) multisampling with >>> MSDISPMODE_PERSAMPLE, >>> + * only one of SIMD8 and SIMD16 should be enabled. So, we have two >>> options >>> + * in above mentioned case: >>> + * 'SIMD8 only' dispatch: allowed on gen6. >>> + * 'SIMD16 only' dispatch: not allowed on gen6. >>> + * >>> + * So, we enable 'SIMD8 only' dispatch in above case. >>> >> >> I went back and re-read the docs, and I think it's fine to enable just >> SIMD8, but don't think it's correct to say that "SIMD16 only" is not >> allowed on gen6. >> > Yeah, I confused criteria E to be a 'SIMD16 only' dispatch which is > applicable for [DevIVB+] only. It's the criteria B which applies here and > allowed on SNB as well. > I'll update this comment. > >> >> The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages >> > Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size) >> Control" shows: >> >> SIMD8 only with validity criterion A >> SIMD16 only with validity criterion B >> SIMD8+SIMD16 with validity criterion D >> ...skipping some irrelevant ones... >> SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't >> support SIMD32) >> ...skipping some more irrelevant ones... >> >> (note: the SNB and IVB PRMs show SIMD16 only with validity criterion F, >> but the end result is the same.) >> >> In the SNB and IVB PRM, Criteria A-F are defined right next to the >> table. In the current bspec, they're defined in 3DSTATE_WM and >> 3DSTATE_PS. They say: >> >> A: Valid on all products. >> B: Valid on [DevCTG+]. Not valid on [DevSNB] if 4x PERPIXEL mode with >> pixel shader computed depth. >> C: Valid only on [DevCTG] and [DevIL] >> D: Valid on all products, except when in non-1x PERSAMPLE mode (applies >> to [DevSNB] only). Not valid on [DevSNB] if 4x PERPIXEL mode with pixel >> shader computed depth. >> > This is what I see in the SNB and IVB spec. Notice "[DevSNB+]": > D: Valid on all products, except when in non-1x PERSAMPLE mode (applies > to [DevSNB+] only). Not valid on [DevSNB]if 4x PERPIXEL mode with pixel > shader computed depth. >
Well shoot, you're right again. Good thing I've only been reviewing patches today and not writing code--I've been making more than my usual number of mistakes. > > >> E: Valid on all products, except when in PERSAMPLE mode with number of >> multisamples >= 8 (applies to [DevIVB+] only). Not valid on [DevSNB] if 4x >> PERPIXEL mode with pixel shader computed depth. >> F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL >> mode with pixel shader computed depth. >> >> >> So the upshot of all this is that in non-1x PERSAMPLE mode: >> >> - On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one >> or the other. >> > right. > > - On gen7, we can enable either SIMD8 or SIMD16 or both. >> > No, we cannot enable both SIMD8 and SIMD16, but we can enable one or the > other. I get this error in simulator on IVB if I attempt to enable both: > If MSAA 4x with MDISPMODE_PERSAMPLE, SIMD8 should not be enabled with > SIMD16 or SIMD32 > > It hangs the gpu with 8x MSAA. > Ok, you're right. Now that I'm reading the "[DevSNB+]" correctly I agree. Thanks for double-checking my work. > >> (previously, we thought that 8x MSAA had to be SIMD8 only, but I believe >> that was a misunderstanding. The restriction on 8x MSAA is only in >> validity criterion E, and that criterion only takes effect if we're doing >> SIMD16+SIMD32, which we never do.) >> > Right. We can enable 'SIMD16 only' dispatch for both 4x and 8x > multisampling. I'll update my 'TODO' comment in gen7_wm_state.c. > Thanks for finding this Paul. > >> >> Please correct me if my logic is wrong. I want to make sure we get to >> the bottom of this. >> >> Assuming your analysis agrees with mine, then the only reason we aren't >> doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a >> chance to get it working yet. At a minimum, we need to update the comment >> to reflect that. It might be worth adding some of the above analysis to >> the comment so that the next person to work on this code doesn't get as >> confused as we did. >> > I'll add more details in comment. > >> >> >>> + */ >>> dw5 |= GEN6_WM_8_DISPATCH_ENABLE; >>> - if (brw->wm.prog_data->prog_offset_16) >>> + >>> + if (brw->wm.prog_data->prog_offset_16 && >>> + !(_mesa_get_min_invocations_per_fragment(ctx, >>> + >>> brw->fragment_program) > 1)) >>> >> >> This "if" statement is overly indented. Also, >> !(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing. Let's >> just do _mesa_get_min_invocations_per_fragment(...) == 1. >> >> (Note: this is assuming you agreed with me that we need to add "MAX2" in >> patch 4. If we don't do that then "== 1" needs to change to "<= 1".) >> > Yes, this definitely looks better. > > >> >>> dw5 |= GEN6_WM_16_DISPATCH_ENABLE; >>> >>> /* CACHE_NEW_WM_PROG | _NEW_COLOR */ >>> @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw) >>> >>> /* _NEW_COLOR, _NEW_MULTISAMPLE */ >>> if (fp->program.UsesKill || ctx->Color.AlphaEnabled || >>> - ctx->Multisample.SampleAlphaToCoverage) >>> + ctx->Multisample.SampleAlphaToCoverage || >>> + brw->wm.prog_data->uses_omask) >>> dw5 |= GEN6_WM_KILL_ENABLE; >>> >>> if (brw_color_buffer_write_enabled(brw) || >>> @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw) >>> dw5 |= GEN6_WM_DISPATCH_ENABLE; >>> } >>> >>> + /* From the SNB PRM, volume 2 part 1, page 278: >>> + * "This bit is inserted in the PS payload header and made available >>> to >>> + * the DataPort (either via the message header or via header bypass) >>> to >>> + * indicate that oMask data (one or two phases) is included in Render >>> + * Target Write messages. If present, the oMask data is used to mask >>> off >>> + * samples." >>> + */ >>> + if(brw->wm.prog_data->uses_omask) >>> + dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET; >>> + >>> /* CACHE_NEW_WM_PROG */ >>> dw6 |= brw->wm.prog_data->num_varying_inputs << >>> GEN6_WM_NUM_SF_OUTPUTS_SHIFT; >>> @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw) >>> dw6 |= GEN6_WM_MSRAST_ON_PATTERN; >>> else >>> dw6 |= GEN6_WM_MSRAST_OFF_PIXEL; >>> - dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL; >>> + >>> + if (_mesa_get_min_invocations_per_fragment(ctx, >>> brw->fragment_program) > 1) >>> + dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE; >>> + else >>> + dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL; >>> } else { >>> dw6 |= GEN6_WM_MSRAST_OFF_PIXEL; >>> dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE; >>> } >>> >>> + /* _NEW_BUFFERS, _NEW_MULTISAMPLE */ >>> + /* From the SNB PRM, volume 2 part 1, page 281: >>> + * "If the PS kernel does not need the Position XY Offsets >>> + * to compute a Position XY value, then this field should be >>> + * programmed to POSOFFSET_NONE." >>> + * >>> + * "SW Recommendation: If the PS kernel needs the Position Offsets >>> + * to compute a Position XY value, this field should match Position >>> + * ZW Interpolation Mode to ensure a consistent position.xyzw >>> + * computation." >>> + * We only require XY sample offsets. So, this recommendation doesn't >>> + * look useful at the moment. We might need this in future. >>> + */ >>> + if (brw->wm.prog_data->uses_pos_offset) >>> + dw6 |= GEN6_WM_POSOFFSET_SAMPLE; >>> + else >>> + dw6 |= GEN6_WM_POSOFFSET_NONE; >>> + >>> BEGIN_BATCH(9); >>> OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2)); >>> OUT_BATCH(brw->wm.base.prog_offset); >>> -- >>> 1.8.1.4 >>> >>> >> With those changes, the patch is: >> >> Reviewed-by: Paul Berry <stereotype...@gmail.com> >> > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev