On 09/01/2013 07:45 AM, Paul Berry wrote:
On 29 August 2013 21:31, Kenneth Graunke wrote:
[snip]
> > I definitely don't want to share portions of 3DSTATE_VS.
In an effort to help us make a more informed decision about this, I've
investigated the following questions:
1. Are there any historical commits that would have been simpler if we
had shared code between VS and FS in the way that we're currently
discussing sharing code between VS and GS? Are there any that would
have been more complex?
These are good questions to be asking - thanks for looking into this.
Are there any historical commits where we made
mistakes because the code wasn't shared (e.g. changed one function but
forgot to change the other)? I looked just at gen7, and I looked at
patches since the beginning of 2012. I considered both the code sharing
that I've proposed in this patch as well as the counter-proposal to just
share the code for constant emission.
These two commits would have been helped by the code sharing I've
proposed; they would also have been helped if we just shared code for
constant emission:
e6893b9 (Set MOCS L3 cacheability for IVB/BYT (v2))
2273b65 (Change L3 MOCS of 3DSTATE_CONSTANT_VS/PS)
>
These two commits would have been helped by the code sharing I've
proposed, but not by sharing code for constant emission.
decc708 (Upload separate per-stage sampler state tables)
f5a690c (i965: Split sampler count variable to be per-stage)
In all of the above cases the savings would have been small--one diff
hunk instead of two.
In the context of the whole MOCS series, the savings are fairly minimal.
We had to set MOCS fields in 10 different places:
brw_emit_vertices
gen6_blorp_emit_vertices
gen7_blorp_emit_surface_state
gen7_blorp_emit_constant_ps
gen7_blorp_emit_depth_stencil_config
gen7_emit_depth_stencil_hiz
gen7_vs_state (1 line)
gen7_wm_state
gen7_update_texture_surface
gen7_update_renderbuffer_surface
If geometry shaders had landed first, we would have had to add MOCS to
3DSTATE_GS as well. This would have meant adding 1 additional line to a
+32 -9 series.
If these are the only patches that really would have been affected in
the last year, and the savings are 1-2 lines of code, I think that shows
that the sharing is not significantly useful.
I didn't find any commits that would have been more complex under either
my or Eric's proposal. I didn't find any instances of past mistakes
where we modified one function but not the other.
2. If we start sharing the code to populate the first 4 DWORDs of
3DSTATE_{VS,GS} now, is it likely that in the future we'll be able to
extend the code sharing to 3DSTATE_{HS,DS,PS}? To answer this question,
I looked at the documentation for all of these commands in Gen7 (IVB and
HSW). I found these differences:
- 3DSTATE_HS uses a completely different layout from the others; for
example its kernel start pointer isn't until dw3, whereas the other
stages use dw1.
- dw2[26] is "denormal mode" for PS; it's MBZ for VS, DS, and GS.
- dw2[17] is "thread priority" for HSW only on VS, DS, and PS; it's
"thread priority" for both HSW and IVB on GS. I suspect this is a
documentation bug, and it's really meant to be HSW only for all stages.
- dw2[15:14] are "rounding mode" on PS only.
- "Accesses UAV" (a HSW-only bit) is on dw2[12] for VS and GS, dw2[14]
for DS, and dw4[5] for PS. I suspect this bit is related to
ARB_shader_image_load_store functionality, but I'm not certain.
- dw2[11] is "mask stack exception enable" on GS and PS; it's MBZ on VS
and DS.
At the moment we never set any of the bits that differ between VS, DS,
GS, and PS, so we could share the code to populate dw0-3 between those 4
stages without having to make stage-specific exceptions. It's possible
that we might have to make stage-specific exceptions in the future if we
ever decide to set some of those bits, but the only likely candidate
seems to be "Accesses UAV". It's pretty clear that we couldn't share
any code to populate dw0-3 of 3DSTATE_HS.
We will need to support UAVs eventually (they're an upcoming GL
feature), but I don't know when or on what platforms.
Since 3DSTATE_HS isn't shareable, this means that we need to add
3DSTATE_GS today, and 3DSTATE_DS in a little while.
I'm still not convinced that the added complexity is worth the benefit.
I would still like to see 3DSTATE_VS, 3DSTATE_GS emitted within a
single function, in one single BEGIN...OUT...ADVANCE block.
--Ken
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev