On Fri, Aug 21, 2015 at 10:14 PM, Ben Widawsky <b...@bwidawsk.net> wrote: > top-post to start from fresh... > > Okay, so after a serious attempt at attempting to write something which I > think > is "more" correct, I failed. I'm planning to just push your patches (with my > r-b and t-b) after I finish a jenkins run. Unless anyone has arguments. > > My biggest confusion is the use of the dummy element - I can't make sense of > why > it's needed, but it clearly is.
The HW is kind of a mess in this corner case, but Neils patch is handling it correctly. Normally we don't need the VE (VERTEX_ELEMENT) entry for the SGVS, since we don't need to fetch anything into a VUE entry to be able to write the SGVS into it. So when we place the SGVS last, we can skip the VE. When we use an edgeflag VE, however, the HW requires we put it last. The VEs describe a contiguous block of entries in the VUE. To make sure the edgeflag VE is placed after the SGVS, we now need a placeholder VE to make room for the SGVS and "push out" the edgeflag VE. Reviewed-by: Kristian Høgsberg <k...@bitplanet.net> > On Tue, Aug 18, 2015 at 06:27:42PM -0700, Ben Widawsky wrote: >> On Mon, Jul 13, 2015 at 06:01:14PM +0100, Neil Roberts wrote: >> > The edge flag data on Gen6+ is passed through the fixed function hardware >> > as >> > an extra attribute. According to the PRM it must be the last valid >> > VERTEX_ELEMENT structure. However if the vertex ID is also used then >> > another >> > extra element is added to source the VID. This made it so the vertex ID is >> > in >> > the wrong register in the vertex shader and the edge attribute is no >> > longer in >> > the last element. >> >> It took me forever to find the docs which stated the edge flag must be >> last... >> could you maybe add a PRM location to the code comment, and/or commit >> message? >> >> > >> > v2: Also implement for BDW+ >> > >> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84677 >> > Cc: "10.6 10.5" <mesa-sta...@lists.freedesktop.org> >> > --- >> > src/mesa/drivers/dri/i965/brw_draw_upload.c | 30 +++++++-------- >> > src/mesa/drivers/dri/i965/gen8_draw_upload.c | 56 >> > +++++++++++++++++++++------- >> > 2 files changed, 57 insertions(+), 29 deletions(-) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_draw_upload.c >> > b/src/mesa/drivers/dri/i965/brw_draw_upload.c >> > index 320e40e..67a8dfd 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_draw_upload.c >> > +++ b/src/mesa/drivers/dri/i965/brw_draw_upload.c >> > @@ -787,21 +787,6 @@ static void brw_emit_vertices(struct brw_context *brw) >> > ((i * 4) << BRW_VE1_DST_OFFSET_SHIFT)); >> > } >> > >> > - if (brw->gen >= 6 && gen6_edgeflag_input) { >> > - uint32_t format = >> > - brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); >> > - >> > - OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) | >> > - GEN6_VE0_VALID | >> > - GEN6_VE0_EDGE_FLAG_ENABLE | >> > - (format << BRW_VE0_FORMAT_SHIFT) | >> > - (gen6_edgeflag_input->offset << >> > BRW_VE0_SRC_OFFSET_SHIFT)); >> > - OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << >> > BRW_VE1_COMPONENT_0_SHIFT) | >> > - (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | >> > - (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | >> > - (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); >> > - } >> > - >> > if (brw->vs.prog_data->uses_vertexid || >> > brw->vs.prog_data->uses_instanceid) { >> > uint32_t dw0 = 0, dw1 = 0; >> > uint32_t comp0 = BRW_VE1_COMPONENT_STORE_0; >> > @@ -842,6 +827,21 @@ static void brw_emit_vertices(struct brw_context *brw) >> > OUT_BATCH(dw1); >> > } >> > >> > + if (brw->gen >= 6 && gen6_edgeflag_input) { >> > + uint32_t format = >> > + brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); >> > + >> > + OUT_BATCH((gen6_edgeflag_input->buffer << GEN6_VE0_INDEX_SHIFT) | >> > + GEN6_VE0_VALID | >> > + GEN6_VE0_EDGE_FLAG_ENABLE | >> > + (format << BRW_VE0_FORMAT_SHIFT) | >> > + (gen6_edgeflag_input->offset << >> > BRW_VE0_SRC_OFFSET_SHIFT)); >> > + OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << >> > BRW_VE1_COMPONENT_0_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); >> > + } >> > + >> > ADVANCE_BATCH(); >> > } >> > >> > diff --git a/src/mesa/drivers/dri/i965/gen8_draw_upload.c >> > b/src/mesa/drivers/dri/i965/gen8_draw_upload.c >> > index f7d9952..2bac5ff 100644 >> > --- a/src/mesa/drivers/dri/i965/gen8_draw_upload.c >> > +++ b/src/mesa/drivers/dri/i965/gen8_draw_upload.c >> > @@ -40,16 +40,25 @@ gen8_emit_vertices(struct brw_context *brw) >> > { >> > struct gl_context *ctx = &brw->ctx; >> > uint32_t mocs_wb = brw->gen >= 9 ? SKL_MOCS_WB : BDW_MOCS_WB; >> > + bool uses_edge_flag; >> > >> > brw_prepare_vertices(brw); >> > brw_prepare_shader_draw_parameters(brw); >> > >> > + uses_edge_flag = (ctx->Polygon.FrontMode != GL_FILL || >> > + ctx->Polygon.BackMode != GL_FILL); >> > + >> > if (brw->vs.prog_data->uses_vertexid || >> > brw->vs.prog_data->uses_instanceid) { >> > unsigned vue = brw->vb.nr_enabled; >> > >> > - WARN_ONCE(brw->vs.prog_data->inputs_read & VERT_BIT_EDGEFLAG, >> > - "Using VID/IID with edgeflags, need to reorder the " >> > - "vertex attributes"); >> > + /* The element for the edge flags must always be last, so we have to >> > + * insert the SGVS before it in that case. >> > + */ >> > + if (uses_edge_flag) { >> > + assert(vue > 0); >> > + vue--; >> > + } >> > + >> >> I don't get this. Shouldn't we be pushing the element with the edge flag to >> the >> end? As an example, suppose nr_enabled = 3. Won't this make us write the SGVs >> out to vue 2, which presumable had something of value before? >> >> Also, I just noticed: >> "It is INVALID to store both the VertexID and InstanceID in the same >> element/component location within the VUE." >> >> Aren't we potentially doing that today (and still with your patch)? >> >> > WARN_ONCE(vue >= 33, >> > "Trying to insert VID/IID past 33rd vertex element, " >> > "need to reorder the vertex attrbutes."); >> > @@ -138,7 +147,18 @@ gen8_emit_vertices(struct brw_context *brw) >> > ADVANCE_BATCH(); >> > } >> > >> > - unsigned nr_elements = brw->vb.nr_enabled + >> > brw->vs.prog_data->uses_vertexid; >> > + /* Normally we don't need an element for the SGVS attribute because the >> > + * 3DSTATE_VF_SGVS instruction lets you store the generated attribute >> > in an >> > + * element that is past the list in 3DSTATE_VERTEX_ELEMENTS. However >> > if the >> > + * vertex ID is used then it needs an element for the base vertex >> > buffer. >> > + * Additionally if there is an edge flag element then the SGVS can't be >> > + * inserted past that so we need a dummy element to ensure that the >> > edge >> > + * flag is the last one. >> > + */ >> > + bool needs_sgvs_element = (brw->vs.prog_data->uses_vertexid || >> > + (brw->vs.prog_data->uses_instanceid && >> > + uses_edge_flag)); >> > + unsigned nr_elements = brw->vb.nr_enabled + needs_sgvs_element; >> > >> > /* The hardware allows one more VERTEX_ELEMENTS than VERTEX_BUFFERS, >> > * presumably for VertexID/InstanceID. >> > @@ -192,6 +212,24 @@ gen8_emit_vertices(struct brw_context *brw) >> > (comp3 << BRW_VE1_COMPONENT_3_SHIFT)); >> > } >> > >> > + if (needs_sgvs_element) { >> > + if (brw->vs.prog_data->uses_vertexid) { >> > + OUT_BATCH(GEN6_VE0_VALID | >> > + brw->vb.nr_buffers << GEN6_VE0_INDEX_SHIFT | >> > + BRW_SURFACEFORMAT_R32_UINT << BRW_VE0_FORMAT_SHIFT); >> > + OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << >> > BRW_VE1_COMPONENT_0_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_1_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_2_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_3_SHIFT)); >> > + } else { >> > + OUT_BATCH(GEN6_VE0_VALID); >> > + OUT_BATCH((BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_0_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_1_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_2_SHIFT) | >> > + (BRW_VE1_COMPONENT_STORE_0 << >> > BRW_VE1_COMPONENT_3_SHIFT)); >> > + } >> > + } >> > + >> > if (gen6_edgeflag_input) { >> > uint32_t format = >> > brw_get_vertex_surface_type(brw, gen6_edgeflag_input->glarray); >> > @@ -206,16 +244,6 @@ gen8_emit_vertices(struct brw_context *brw) >> > (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | >> > (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); >> > } >> > - >> > - if (brw->vs.prog_data->uses_vertexid) { >> > - OUT_BATCH(GEN6_VE0_VALID | >> > - brw->vb.nr_buffers << GEN6_VE0_INDEX_SHIFT | >> > - BRW_SURFACEFORMAT_R32_UINT << BRW_VE0_FORMAT_SHIFT); >> > - OUT_BATCH((BRW_VE1_COMPONENT_STORE_SRC << >> > BRW_VE1_COMPONENT_0_SHIFT) | >> > - (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_1_SHIFT) | >> > - (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_2_SHIFT) | >> > - (BRW_VE1_COMPONENT_STORE_0 << BRW_VE1_COMPONENT_3_SHIFT)); >> > - } >> > ADVANCE_BATCH(); >> > >> > for (unsigned i = 0; i < brw->vb.nr_enabled; i++) { >> >> Empirically, this patch is needed, since without it, I cannot get the test to >> pass. However, since I don't understand it well enough, I am going to push >> 1/3, >> and wait until you get back from vacation to continue with this. >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev