On 07.09.2017 16:56, Marek Olšák wrote:
+static bool si_out_of_order_rasterization(struct si_context *sctx)
+{
+ struct si_state_blend *blend = sctx->queued.named.blend;
+ struct si_state_dsa *dsa = sctx->queued.named.dsa;
+
+ if (!sctx->screen->has_out_of_order_rast)
+ return false;
+
+ /* PS with memory stores can't run out-of-order. */
+ if (sctx->ps_shader.cso &&
+ sctx->ps_shader.cso->info.writes_memory)
+ return false;
I'm actually not sure this is necessary. The spec is quite relaxed about the
order of pixel shader invocations and whether they happen at all.
If DEPTH_BEFORE_SHADER is 1, depth tests have to be done in-order. If
they are done out-of-order, pixel shader invocations that would
normally be rejected can be executed needlessly. Does the spec allow
that?
Good point.
+
+ unsigned colormask = sctx->framebuffer.colorbuf_enabled_4bit &
+ blend->cb_target_enabled_4bit;
+
+ /* No logic op. */
+ if (colormask && blend->logicop_enable)
+ return false;
+
+ struct r600_texture *zstex =
+ (struct
r600_texture*)sctx->framebuffer.state.zsbuf->texture;
+ bool has_stencil = sctx->framebuffer.state.zsbuf &&
+ zstex->surface.flags & RADEON_SURF_SBUFFER;
+ bool blend_enabled = (colormask & blend->blend_enable_4bit) != 0;
+
+ /* Out-of-order rasterization can be enabled for these cases:
+ *
+ * - color-only rendering:
+ * + blending must be enabled and commutative
+ * + only when inexact behavior due to rounding is allowed
+ *
+ * - depth-only rendering:
+ * + depth must force ordering
+ *
+ * - stencil-only rendering:
+ * + never --- can we do better here?
+ *
+ * - color rendering with read-only depth:
+ * + blending must be disabled
+ * + depth must force ordering
+ *
+ * - color rendering with read-only stencil:
+ * + blending must be disabled
+ *
+ * - color+depth rendering:
+ * + blending must be disabled
+ * + depth must force ordering
+ * + only when Z-fighting is allowed to result in inexact
behavior
+ *
+ * - color+stencil rendering:
+ * + never --- can we do better here?
+ *
+ * - color+depth+stencil rendering:
+ * + never --- can we do better here?
+ */
I can't quite wrap my head around the logic here.
Here's a suggestion for cleaning it up conceptually:
- Record in DSA whether DSA *by itself* can run out-of-order or not, meaning
that the final result in Z/S is unaffected by out-of-order
-- This is trivially the case when there are no Z/S writes
-- It is also the case when stencil writes are disabled and Zfunc is NEVER
or one of the ordered ones ("depth_forces_ordering", currently)
-- It is also the case when depth writes are disabled, Sfunc is ALWAYS, and
zpass_op/zfail_op are KEEP, ZERO, REPLACE, INVERT, INCR_WRAP, DECR_WRAP, or
Sfunc is NEVER and the same applies to fail_op [I think this allows
out-of-order to be enabled for stencil shadow passes]
- Record in DSA whether the set of fragments passing DSA is unaffected by
out-of-order
-- This is trivially the case when there are no Z/S writes
-- It is the case when stencil writes are disabled and Zfunc is ALWAYS or
NEVER
-- It is the case when depth writes are disabled and Sfunc is ALWAYS or
NEVER
- Record in DSA whether the *last* fragment passing DSA for each sample is
unaffected by out-of-order
-- This is *never* the case if we're being honest, but we can enable it in
an optional "aggressive" mode when stencil writes are disabled, Z writes are
enabled and Z func is one of the ordered functions
The overall out-of-order enable is then:
- if DSA by itself cannot run out-of-order, disable
- if color writes are disabled, enable
- if logic op is enabled, disable
- if blending is enabled:
o disable if non-commutative
o enable if commutative and the set of fragments passing DSA is unaffected
by out-of-rder
- if blending is disabled, enable iff the *last* fragment passing DSA is
unaffected
Thinking it through in this way, I believe I discovered at least one bug in
the patch as-is, in the following configuration
- blending disabled and no stencil
- depth is enabled, depth writes are disabled, and Zfunc == LESS
In this case, dsa->depth_forces_ordering_color_on will be true and
out-of-order will be enabled. But that's not correct, because there may be
multiple triangles with Z-values less than whatever's in the depth buffer.
On second thought, the whole "record in DSA" thing gets a bit more
complicated because it interacts with whether Z/S buffers are actually
present. The no-Z/S case is easy (the first two bits are "Yes", the last one
is "No"), but we need to distinguish whether stencil is present or not.
Maybe both of these can be pre-calculated and stored in DSA.
You seem to be much better at it than I am. If you wanna take a stab
at it, here are both patches in the reversed order:
https://cgit.freedesktop.org/~mareko/mesa/log/?h=out-of-order-rast
I'll take a stab at it.
Cheers,
Nicolai
Marek
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev