On Thu, Oct 12, 2017 at 11:38 AM, Jose Maria Casanova Crespo < jmcasan...@igalia.com> wrote:
> We enable the use of 16-bit values in push constants > modifying the assign_constant_locations function to work > with 16-bit types. > > The API to access buffers in Vulkan use multiples of 4-byte for > offsets and sizes. Current accountability of uniforms based on 4-byte > slots will work for 16-bit values if they are allowed to use 32-bit > slots. For that, we replace the division by 4 by a DIV_ROUND_UP, so > 2-byte elements will use 1 slot instead of 0. > I'm fairly sure this doesn't actually work correctly. That said, I'm also fairly sure the current code is broken for 64 bits. In particular, let's suppose we have something like this: layout(push_constant) { struct { int i; int pad1; float f; double d; int pad2; } arr[2]; } main() { out_color = vec4(arr[arr[0].i].f, float(arr[arr[0].i].d), arr[arr[1].i].f, float(arr[arr[1].i].d)); } I'm pretty sure the current code will explode because it will see a single contiguous chunk that's neither 32 nor 64-bit. If that particular shader doesn't break it, I'm sure some permutation of it will. Things only get worse if we throw in 16-bit. Ultimately, I think the solution is to throw away our current scheme of trying to separate things out by bit size and move to a scheme where we work with everything in bytes in assign_constant_locations and trust in the contiguous[] array to determine where to split things up. We would probably want to continue only re-arranging things 4 bytes at a time. That said, I don't think this patch makes anything any worse... > We aligns the 16-bit locations after assigning the 32-bit > ones. > --- > src/intel/compiler/brw_fs.cpp | 30 +++++++++++++++++++++++------- > 1 file changed, 23 insertions(+), 7 deletions(-) > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index a1d49a63be..8da16145dc 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -1909,8 +1909,9 @@ set_push_pull_constant_loc(unsigned uniform, int > *chunk_start, > if (!contiguous) { > /* If bitsize doesn't match the target one, skip it */ > if (*max_chunk_bitsize != target_bitsize) { > - /* FIXME: right now we only support 32 and 64-bit accesses */ > - assert(*max_chunk_bitsize == 4 || *max_chunk_bitsize == 8); > + assert(*max_chunk_bitsize == 4 || > + *max_chunk_bitsize == 8 || > + *max_chunk_bitsize == 2); > *max_chunk_bitsize = 0; > *chunk_start = -1; > return; > @@ -1987,8 +1988,9 @@ fs_visitor::assign_constant_locations() > int constant_nr = inst->src[i].nr + inst->src[i].offset / 4; > > if (inst->opcode == SHADER_OPCODE_MOV_INDIRECT && i == 0) { > - assert(inst->src[2].ud % 4 == 0); > - unsigned last = constant_nr + (inst->src[2].ud / 4) - 1; > + assert(type_sz(inst->src[i].type) == 2 ? > + (inst->src[2].ud % 2 == 0) : (inst->src[2].ud % 4 == > 0)); > + unsigned last = constant_nr + DIV_ROUND_UP(inst->src[2].ud, > 4) - 1; > assert(last < uniforms); > > for (unsigned j = constant_nr; j < last; j++) { > @@ -2000,8 +2002,8 @@ fs_visitor::assign_constant_locations() > bitsize_access[last] = MAX2(bitsize_access[last], > type_sz(inst->src[i].type)); > } else { > if (constant_nr >= 0 && constant_nr < (int) uniforms) { > - int regs_read = inst->components_read(i) * > - type_sz(inst->src[i].type) / 4; > + int regs_read = DIV_ROUND_UP(inst->components_read(i) * > + type_sz(inst->src[i].type), > 4); > for (int j = 0; j < regs_read; j++) { > is_live[constant_nr + j] = true; > bitsize_access[constant_nr + j] = > @@ -2062,7 +2064,7 @@ fs_visitor::assign_constant_locations() > > } > > - /* Then push the rest of uniforms */ > + /* Then push the 32-bit uniforms */ > const unsigned uniform_32_bit_size = type_sz(BRW_REGISTER_TYPE_F); > for (unsigned u = 0; u < uniforms; u++) { > if (!is_live[u]) > @@ -2081,6 +2083,20 @@ fs_visitor::assign_constant_locations() > stage_prog_data); > } > > + const unsigned uniform_16_bit_size = type_sz(BRW_REGISTER_TYPE_HF); > + for (unsigned u = 0; u < uniforms; u++) { > + if (!is_live[u]) > + continue; > + > + set_push_pull_constant_loc(u, &chunk_start, &max_chunk_bitsize, > + contiguous[u], bitsize_access[u], > + uniform_16_bit_size, > + push_constant_loc, pull_constant_loc, > + &num_push_constants, &num_pull_constants, > + max_push_components, max_chunk_size, > + stage_prog_data); > + } > + > /* Add the CS local thread ID uniform at the end of the push constants > */ > if (thread_local_id_index >= 0) > push_constant_loc[thread_local_id_index] = num_push_constants++; > -- > 2.13.6 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev