On Thu, Jan 5, 2017 at 6:56 AM, Juan A. Suarez Romero <jasua...@igalia.com> wrote:
> On Thu, 2017-01-05 at 06:41 -0800, Jason Ekstrand wrote: > > On Jan 5, 2017 3:11 AM, "Juan A. Suarez Romero" <jasua...@igalia.com> > wrote: > > On Wed, 2017-01-04 at 07:06 -0800, Jason Ekstrand wrote: > > On Jan 4, 2017 5:46 AM, "Juan A. Suarez Romero" <jasua...@igalia.com> > wrote: > > On Tue, 2017-01-03 at 14:41 -0800, Jason Ekstrand wrote: > > I made a few pretty trivial comments. With those addressed, > > Reviewed-by: Jason Ekstrand <ja...@jlekstrand.net> > > On Dec 16, 2016 8:55 AM, "Juan A. Suarez Romero" <jasua...@igalia.com> > wrote: > > So far, input_reads was a bitmap tracking which vertex input locations > were being used. > > In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4) > consumes just one location, any other small attribute. So we mark the > proper bit in inputs_read, and also the same bit in double_inputs_read > if the attribute is a dvec3/dvec4. > > But in Vulkan, this is slightly different: a dvec3/dvec4 attribute > consumes two locations, not just one. And hence two bits would be marked > in inputs_read for the same vertex input attribute. > > To avoid handling two different situations in NIR, we just choose the > latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4 > vertex input attribute is marked with two bits in the inputs_read bitmap > (and also in the double_inputs_read), and following attributes are > adjusted accordingly. > > As example, if in our GLSL/IR shader we have three attributes: > > layout(location = 0) vec3 attr0; > layout(location = 1) dvec4 attr1; > layout(location = 2) dvec3 attr2; > > then in our NIR shader we put attr0 in location 0, attr1 in locations 1 > and 2, and attr2 in location 3. > > > attr2 goes in locations 3 *and* 4, correct? > > Checking carefully, basically we are using slots rather than locations > in NIR. > > When emitting the vertices, we do a inverse map to know the > corresponding location for each slot. > > v2 (Jason): > - use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR. > --- > src/compiler/glsl/glsl_to_nir.cpp | 28 +++++++++++++ > src/compiler/nir/nir_gather_info.c | 48 ++++++++++----------- > src/intel/vulkan/genX_pipeline.c | 63 > +++++++++++++++++----------- > src/mesa/drivers/dri/i965/brw_draw_upload.c | 11 +++-- > src/mesa/drivers/dri/i965/brw_fs.cpp | 13 ------ > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 3 +- > src/mesa/drivers/dri/i965/brw_nir.c | 6 +-- > src/mesa/drivers/dri/i965/brw_nir.h | 1 - > src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +++-- > 9 files changed, 106 insertions(+), 78 deletions(-) > > diff --git a/src/compiler/glsl/glsl_to_nir.cpp > b/src/compiler/glsl/glsl_to_nir.cpp > index 4debc37..0814dad 100644 > --- a/src/compiler/glsl/glsl_to_nir.cpp > +++ b/src/compiler/glsl/glsl_to_nir.cpp > @@ -129,6 +129,19 @@ private: > > } /* end of anonymous namespace */ > > +static void > +nir_remap_attributes(nir_shader *shader) > +{ > + nir_foreach_variable(var, &shader->inputs) { > + var->data.location += > _mesa_bitcount_64(shader->info->double_inputs_read > & > + > BITFIELD64_MASK(var->data.location)); > + } > + > + /* Once the remap is done, reset double_inputs_read, so later it will > have > + * which location/slots are doubles */ > + shader->info->double_inputs_read = 0; > +} > + > nir_shader * > glsl_to_nir(const struct gl_shader_program *shader_prog, > gl_shader_stage stage, > @@ -146,6 +159,13 @@ glsl_to_nir(const struct gl_shader_program > *shader_prog, > > nir_lower_constant_initializers(shader, (nir_variable_mode)~0); > > + /* Remap the locations to slots so those requiring two slots will > occupy > + * two locations. For instance, if we have in the IR code a dvec3 > attr0 in > + * location 0 and vec4 attr1 in location 1, in NIR attr0 will use > + * locations/slots 0 and 1, and attr1 will use location/slot 2 */ > + if (shader->stage == MESA_SHADER_VERTEX) > + nir_remap_attributes(shader); > + > shader->info->name = ralloc_asprintf(shader, "GLSL%d", > shader_prog->Name); > if (shader_prog->Label) > shader->info->label = ralloc_strdup(shader, shader_prog->Label); > @@ -315,6 +335,14 @@ nir_visitor::visit(ir_variable *ir) > } else { > var->data.mode = nir_var_shader_in; > } > + > + /* Mark all the locations that require two slots */ > + if (glsl_type_is_dual_slot(glsl_without_array(var->type))) { > + for (uint i = 0; i < glsl_count_attribute_slots(var->type, > true); i++) { > + uint64_t bitfield = BITFIELD64_BIT(var->data.location + i); > + shader->info->double_inputs_read |= bitfield; > + } > + } > break; > > case ir_var_shader_out: > diff --git a/src/compiler/nir/nir_gather_info.c > b/src/compiler/nir/nir_gather_info.c > index 07c9949..35a1ce4 100644 > --- a/src/compiler/nir/nir_gather_info.c > +++ b/src/compiler/nir/nir_gather_info.c > @@ -53,11 +53,6 @@ set_io_mask(nir_shader *shader, nir_variable *var, int > offset, int len) > else > shader->info->inputs_read |= bitfield; > > - /* double inputs read is only for vertex inputs */ > - if (shader->stage == MESA_SHADER_VERTEX && > - glsl_type_is_dual_slot(glsl_without_array(var->type))) > - shader->info->double_inputs_read |= bitfield; > - > if (shader->stage == MESA_SHADER_FRAGMENT) { > shader->info->fs.uses_sample_qualifier |= var->data.sample; > } > @@ -83,26 +78,21 @@ static void > mark_whole_variable(nir_shader *shader, nir_variable *var) > { > const struct glsl_type *type = var->type; > - bool is_vertex_input = false; > > if (nir_is_per_vertex_io(var, shader->stage)) { > assert(glsl_type_is_array(type)); > type = glsl_get_array_element(type); > } > > - if (shader->stage == MESA_SHADER_VERTEX && > - var->data.mode == nir_var_shader_in) > - is_vertex_input = true; > - > const unsigned slots = > var->data.compact ? DIV_ROUND_UP(glsl_get_length(type), 4) > - : glsl_count_attribute_slots(type, > is_vertex_input); > + : glsl_count_attribute_slots(type, false); > > set_io_mask(shader, var, 0, slots); > } > > static unsigned > -get_io_offset(nir_deref_var *deref, bool is_vertex_input) > +get_io_offset(nir_deref_var *deref) > { > unsigned offset = 0; > > @@ -117,7 +107,7 @@ get_io_offset(nir_deref_var *deref, bool > is_vertex_input) > return -1; > } > > - offset += glsl_count_attribute_slots(tail->type, > is_vertex_input) * > + offset += glsl_count_attribute_slots(tail->type, false) * > deref_array->base_offset; > } > /* TODO: we can get the offset for structs here see nir_lower_io() > */ > @@ -163,12 +153,7 @@ try_mask_partial_io(nir_shader *shader, nir_deref_var > *deref) > return false; > } > > - bool is_vertex_input = false; > - if (shader->stage == MESA_SHADER_VERTEX && > - var->data.mode == nir_var_shader_in) > - is_vertex_input = true; > - > - unsigned offset = get_io_offset(deref, is_vertex_input); > + unsigned offset = get_io_offset(deref); > if (offset == -1) > return false; > > @@ -184,8 +169,7 @@ try_mask_partial_io(nir_shader *shader, nir_deref_var > *deref) > } > > /* double element width for double types that takes two slots */ > - if (!is_vertex_input && > - glsl_type_is_dual_slot(glsl_without_array(type))) { > + if (glsl_type_is_dual_slot(glsl_without_array(type))) { > elem_width *= 2; > } > > @@ -220,13 +204,27 @@ gather_intrinsic_info(nir_intrinsic_instr *instr, > nir_shader *shader) > case nir_intrinsic_interp_var_at_sample: > case nir_intrinsic_interp_var_at_offset: > case nir_intrinsic_load_var: > - case nir_intrinsic_store_var: > - if (instr->variables[0]->var->data.mode == nir_var_shader_in || > - instr->variables[0]->var->data.mode == nir_var_shader_out) { > + case nir_intrinsic_store_var: { > + nir_variable *var = instr->variables[0]->var; > + > + if (var->data.mode == nir_var_shader_in || > + var->data.mode == nir_var_shader_out) { > if (!try_mask_partial_io(shader, instr->variables[0])) > - mark_whole_variable(shader, instr->variables[0]->var); > + mark_whole_variable(shader, var); > + > + /* We need to track which input_reads bits correspond to a > + * dvec3/dvec4 input attribute */ > + if (shader->stage == MESA_SHADER_VERTEX && > + var->data.mode == nir_var_shader_in && > + glsl_type_is_dual_slot(glsl_without_array(var->type))) { > + for (uint i = 0; i < glsl_count_attribute_slots(var->type, > false); i++) { > + int idx = var->data.location + i; > + shader->info->double_inputs_read |= BITFIELD64_BIT(idx); > + } > + } > } > break; > + } > > case nir_intrinsic_load_draw_id: > case nir_intrinsic_load_front_face: > diff --git a/src/intel/vulkan/genX_pipeline.c > b/src/intel/vulkan/genX_pipeline.c > index 845d020..7b94959 100644 > --- a/src/intel/vulkan/genX_pipeline.c > +++ b/src/intel/vulkan/genX_pipeline.c > @@ -33,26 +33,33 @@ > static uint32_t > vertex_element_comp_control(enum isl_format format, unsigned comp) > { > - uint8_t bits; > switch (comp) { > - case 0: bits = isl_format_layouts[format].channels.r.bits; break; > - case 1: bits = isl_format_layouts[format].channels.g.bits; break; > - case 2: bits = isl_format_layouts[format].channels.b.bits; break; > - case 3: bits = isl_format_layouts[format].channels.a.bits; break; > - default: unreachable("Invalid component"); > - } > - > - if (bits) { > - return VFCOMP_STORE_SRC; > - } else if (comp < 3) { > - return VFCOMP_STORE_0; > - } else if (isl_format_layouts[format].channels.r.type == ISL_UINT || > - isl_format_layouts[format].channels.r.type == ISL_SINT) { > - assert(comp == 3); > - return VFCOMP_STORE_1_INT; > - } else { > - assert(comp == 3); > - return VFCOMP_STORE_1_FP; > + case 0: > + return isl_format_layouts[format].channels.r.bits ? > + VFCOMP_STORE_SRC : VFCOMP_STORE_0; > + case 1: > + return isl_format_layouts[format].channels.g.bits ? > + VFCOMP_STORE_SRC : VFCOMP_STORE_0; > + case 2: > + return isl_format_layouts[format].channels.b.bits ? > + VFCOMP_STORE_SRC : ((isl_format_layouts[format].channels.r.type > == ISL_RAW) ? > + VFCOMP_NOSTORE : VFCOMP_STORE_0); > > > Given all the line wrapping, I think it would be clearer to just have an > if ladder here > > + case 3: > + if (isl_format_layouts[format].channels.a.bits) > + return VFCOMP_STORE_SRC; > + else > > > Please use braces when one side of the if/else is multiple lines. > > + switch (isl_format_layouts[format].channels.r.type) { > + case ISL_RAW: > + return isl_format_layouts[format].channels.b.bits ? > + VFCOMP_STORE_0 : VFCOMP_NOSTORE; > > > This seems a bit odd. Mind explaining what's going on here? > > > Yes. When emitting 64-bit components, we either write 128 or 256 bits, > using VFCOMP_STORE_0 to pad output properly. We write 256 bits if we need > to emit Blue and/or Alpha components. > > If we only need to write 128bits then we use VFCOMP_NOSTORE for the 3rd > and 4th components. > > In above code, we already know we are not emitting Alpha (the first > condition in "case 3" is false). If we neither need to emit Blue component, > then we only need to write 128 bits, so we return NOSTORE. > > But if Blue is required, then we need to write 256bits, so we return > VFCOMP_STORE_0 to pad the output. > > > Maybe this would be clearer: leave it structured the way it was with the > "bits" temporary and add a new case to the if ladder right after the first > one that is > > } else if (comp >= 2 && !isl_format_layouts[format].channels.b.bits && > isl_format_layouts[format].channels.r.type == ISL_RAW) { > /* comment about writing 128-bit chunks */ > return VFCOMP_NOSTORE; > > > Sounds good for me. This also requires to change the next branch to > > } else if(comp < 3 || (comp == 3 && > isl_format_layouts[format].channels.r.type == ISL_RAW)) { > return VFCOMP_STORE_0; > > > Otherwise, it would return the wrong value for 4th component when format > is R64G64B64. > > > Why are we defaulting the 4th component to zero for 64-bit inputs? I > guess the format wouldn't register as integer or float so neither of the > other two makes sense. What does the spec day the 4th component should be > defaulted to? I'm a bit concerned we may find ourselves needing to inspect > the shader to get this 100% right. :-( > > > Because in the example above, we need do not have a value for the 4th > component, but we wrote 3*64 = 192 bits. And as we need to write 256, we > pad the remaining 64 bits with 0, as required by spec. > > From Broadwell spec, command reference structures, page 586: > > "When SourceElementFormat is set to one of the *64*_PASSTHRU formats, > 64-bit components are stored > in the URB without any conversion. In this case, vertex elements must be > written as 128 or 256 bits, with > VFCOMP_STORE_0 being used to pad the output as required. E.g., if > R64_PASSTHRU is used to copy a > 64-bit Red component into the URB, Component 1 must be specified as > VFCOMP_STORE_0 (with > Components 2,3 set to VFCOMP_NOSTORE) in order to output a 128-bit vertex > element, or Components > 1-3 must be specified as VFCOMP_STORE_0 in order to output a 256-bit > vertex element. Likewise, use of > R64G64B64_PASSTHRU requires Component 3 to be specified as VFCOMP_STORE_0 > in order to output a > 256-bit vertex element." > Ok, please add a spec citation. to explain why we're not using 1 for the 4th component of passthrough formats. With that, I think it should be good-to-go. I didn't realize we were working with a hardware restriction. > > > Note this is not specific for the 4th component. For instance, if the > input element is a 64-bit single-component value (R64), we would be writing > 128bits: the first component is the entire Red value (VFCOMP_STORE_SRC), > which would be > using the first 64bits; and for the second component we would use 0 > (VCOMP_STORE_0), to pad the remaining 64bits. In this case, as we do not > have neither 3rd nor 4th components, we would > use VFCOMP_NOSTORE, so only 128bits are written. > > > I also tried to test what happens when using other value rathen than 0, > and it doesn't work. > > > J.A. > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev