Paul Berry <stereotype...@gmail.com> writes: > On 15 January 2014 14:01, Francisco Jerez <curroje...@riseup.net> wrote: > >> Paul Berry <stereotype...@gmail.com> writes: >> >> > On 2 December 2013 11:31, Francisco Jerez <curroje...@riseup.net> wrote: >> > >> >> Until now it was only being taken into account in the VEC4 back-end >> >> but not in the FS back-end. Do it in both cases. >> >> --- >> >> src/mesa/drivers/dri/i965/brw_fs.h | 2 +- >> >> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 10 ++++++---- >> >> src/mesa/drivers/dri/i965/brw_shader.h | 7 ++++--- >> >> 3 files changed, 11 insertions(+), 8 deletions(-) >> >> >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h >> >> b/src/mesa/drivers/dri/i965/brw_fs.h >> >> index 2c36d9f..f918f7e 100644 >> >> --- a/src/mesa/drivers/dri/i965/brw_fs.h >> >> +++ b/src/mesa/drivers/dri/i965/brw_fs.h >> >> @@ -615,4 +615,4 @@ bool brw_do_channel_expressions(struct exec_list >> >> *instructions); >> >> bool brw_do_vector_splitting(struct exec_list *instructions); >> >> bool brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program >> >> *prog); >> >> >> >> -struct brw_reg brw_reg_from_fs_reg(fs_reg *reg); >> >> +struct brw_reg brw_reg_from_fs_reg(fs_reg *reg, unsigned >> dispatch_width); >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> >> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> >> index 8d310a1..1de59eb 100644 >> >> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> >> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> >> @@ -981,8 +981,9 @@ static uint32_t brw_file_from_reg(fs_reg *reg) >> >> } >> >> >> >> struct brw_reg >> >> -brw_reg_from_fs_reg(fs_reg *reg) >> >> +brw_reg_from_fs_reg(fs_reg *reg, unsigned dispatch_width) >> >> { >> >> + const int reg_size = 4 * dispatch_width; >> >> >> > >> > What happens when reg.type is UW and dispatch_width is 16? In that case, >> > we would compute reg_size == 64, but the correct value seems like it's >> > actually 32 in this case. >> > >> > Are we perhaps relying on reg.type being a 32-bit type? If so, maybe we >> > should add an assertion: >> > >> > assert(type_sz(reg.type) == 4); >> > >> >> Nope, "reg_size" is supposed to be the size in bytes of a ::reg_offset >> unit, i.e. one hardware register in SIMD8 mode and two hardware >> registers in SIMD16 mode as the comment at the definition of >> ::reg_offset explains. The fixed factor of four is intentional and >> correct no matter what the register type is. >> >> Thanks. >> > > Ok, I see. It appears that both this function *and* the comment above > reg_offset are assuming that the data type is 32 bit. The comment above > reg_offset says: > > * For pre-register-allocation GRFs and MRFs, this is in units of a > * float per pixel (1 hardware register for SIMD8 or SIMD4x2 mode, > * or 2 registers for SIMD16 mode). For uniforms, this is in units > * of 1 float. > > but when we get around to adding support for double-precision floats (a > feature of GL 4.0), this will no longer work; for double precision types > we'll need reg_offset to be measured in units of at least 4 hardware > registers in SIMD16 mode to avoid overlap. > > Similarly, for types that are 16 bits, if we consider reg_offset to be > measured in units of 2 hardware registers in SIMD16 mode, we're actually > wasting registers, since all 16 values actually fit in a single hardware > register. That's not really a big deal right now, since we use 16-bit > types so rarely in the FS back-end.
I see what you mean, but it seems rather problematic to me to have the unit of reg_offset depend on the register data type. E.g. bit-casting the contents of a register to a type of different size would involve non-trivial algebra on reg_offset. IMHO the ideal solution would be to settle on a fixed unit (e.g. bytes, and remove the subreg_offset field completely) and use a helper function to get the array indexing that seems to be the main use case of reg_offset (e.g. 'index(base_reg, i)' that would take into account the type size of 'base_reg' to calculate the byte offset of element 'i'). > In fact, I bet we never use a nonzero reg_offset on a 16-bit type. > Not sure if we do already, but my surface packing/unpacking code might in some situations. > It still seems to me that it would be worth putting an assertion here to > help alert us to what needs to change when we add double precision support > (or if someday we have hardware that supports half float computation). I'm > not 100% sure what the assertion could be. "assert(type_sz(reg->type) == > 4);" was an optimistic guess. We might have to do > "assert(type_sz(reg->type) == 4 || reg->reg_offset == 0);" in order to > avoid tripping on the rare cases where we currently use 16-bit types in > fragment shaders. I think that such an assertion would break my homogeneous packing/unpacking code if it ever gets an argument with reg_offset != 0, because it casts its argument into a smaller type (either 8 or 16 bits) and then uses 'subreg_offset' to select the individual components of the packed vector. P.S.: Sorry for the late reply.
pgpED3gpN8Ntu.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev