Jose Maria Casanova Crespo <jmcasan...@igalia.com> writes: > These new methods return for a instruction register source/destination > the read/write byte pattern of the 32-byte GRF as an unsigned int. > > The returned pattern takes into account the exec_size of the instruction, > the type bitsize, the register stride and a relative offset inside the > register. > > The motivation of this functions if to know the read/written bytes > of the instructions to improve the liveness analysis for partial > read/writes. > > We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL > and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize > parameter they have a different read pattern. > > v2: (Francisco Jerez) > - Split original register_byte_use_pattern into one read and other > write. > - Check for send like instructions using this->mlen != 0 > - Pass functions src number and offset. > - Use periodic_mask function with code written by Francisco Jerez > to simplify pattern generation. > - Avoid breaking silently if source straddles multiple GRFs. > > v3: (Francisco Jerez) > - A SEND could be this->mlen != 0 or this->is_send_from_grf > - We only assume that a periodic mask with offset could be applied > to reg_offset == 0. > - We can assure that for MOVs operations for any offset (Chema) > > v4: (Francisco Jerez) > - We return 0 mask for reg_offset out of the region definition. > - We return periodic masks when access is in bounds for ALU opcodes. > > v5: (Francisco Jerez) > - Mask can only be periodic when byte_offset < type_size * stride > when reg_offset > 0. > > Cc: Francisco Jerez <curroje...@riseup.net> > --- > src/intel/compiler/brw_fs.cpp | 121 +++++++++++++++++++++++++++++++++ > src/intel/compiler/brw_ir_fs.h | 2 + > 2 files changed, 123 insertions(+) > > diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp > index 7ddbd285fe2..d790b080e53 100644 > --- a/src/intel/compiler/brw_fs.cpp > +++ b/src/intel/compiler/brw_fs.cpp > @@ -39,6 +39,7 @@ > #include "compiler/glsl_types.h" > #include "compiler/nir/nir_builder.h" > #include "program/prog_parameter.h" > +#include <limits.h> >
I don't think you need to include this. > using namespace brw; > > @@ -687,6 +688,126 @@ fs_inst::is_partial_write() const > this->dst.offset % REG_SIZE != 0); > } > > +/** > + * Returns a periodic mask that is repeated "count" times with a "step" > + * size and consecutive "bits" finally shifted "offset" bits to the left. > + * With 'size' you probably mean periodicity. And with 'consecutive "bits"' you probably mean there are "bits" consecutive ones per period. > + * This helper is used to calculate the representations of byte read/write > + * register patterns > + * > + * Example: periodic_mask(8, 4, 2, 0) would return 0x33333333 > + * periodic_mask(8, 4, 2, 2) would return 0xcccccccc > + * periodic_masc(8, 2, 2, 16) would return 0xffff0000 > + */ > +static inline uint32_t > +periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset) > +{ > + uint32_t m = (count ? (1 << bits) - 1 : 0); > + const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT); > + > + for (unsigned shift = step; shift < max; shift *= 2) > + m |= m << shift; > + > + return m << offset; I don't think there's any benefit from passing the offset as argument of this function at this point. The comma is less self-documenting than a left-shift operator applied on the result. > +} > + > +/** > + * Returns a 32-bit uint whose bits represent if the associated register byte > + * has been written by the instruction. The returned pattern takes into > + * account the exec_size of the instruction, the type bitsize, the stride > + * of the destination register and the internal register byte offset. > + * > + * The objective of this function is to identify which parts of the register > + * are written for operations that don't write a full register. So we > + * we can identify in live range variable analysis if a partial write has > + * completelly defined the data used by a partial read. > + * > + * reg_offset identifies full registers starting at dst.reg with > + * reg_offset == 0. Last sentence seems misleading because the dst.offset also has an influence on which GRF is referred to by a given reg_offset. Maybe write something along the lines of "reg_offset selects the GRF the mask is calculated for, relative to the first GRF written by the instruction". > + */ > +unsigned > +fs_inst::dst_write_pattern(unsigned reg_offset) const > +{ > + assert(this->dst.file == VGRF); > + > + /* Instruction doesn't write out of bounds */ > + if (reg_offset >= regs_written(this)) > + return 0; > + > + /* We don't know what is written so we return the worst case */ > + if (this->predicate && this->opcode != BRW_OPCODE_SEL) > + return 0; > + > + /* We assume that send destinations are completelly defined */ completely > + if (this->is_send_from_grf() || this->mlen != 0) > + return ~0u; > + > + /* The byte pattern is calculated using a periodic mask for ALU > + * operations and reg_offset in bounds. > + */ > + unsigned step = this->dst.stride * type_sz(this->dst.type); > + unsigned byte_offset = this->dst.offset % REG_SIZE; constify. > + if (reg_offset == 0 || byte_offset < step) { > + return periodic_mask(this->exec_size, step, type_sz(this->dst.type), > + byte_offset); > + } > + > + /* We don't know what was written so return 0 as safest choice */ > + return 0; > +} > + > +/** > + * Returns a 32-bit uint whose bits represent if the associated register byte > + * has been read by the instruction. The returned pattern takes into > + * account the exec_size of the instruction, the type bitsize and stride of > + * a source register and the internal register byte offset. > + * > + * The objective of this function is to identify which parts of the register > + * are used for operations that don't read a full register. > + * > + * Parameter i identifies the instruction source number and reg_offset > + * identifies full registers starting at src[i].reg with reg_offset == 0. Same nit-pick as for the comment above. > + */ > +unsigned > +fs_inst::src_read_pattern(int i, unsigned reg_offset) const "i" should probably be unsigned. > +{ > + assert(src[i].file == VGRF); > + > + /* Instruction doesn't read out of bounds */ > + if (reg_offset >= regs_read(this, i)) > + return 0u; > + > + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned > + * so the read pattern depends on the bitsize stored at src[4]. > + */ > + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && i == 1) > + return periodic_mask(8, 4, this->src[4].ud / 8, 0); Maybe leave some space after the end of the conditional block for consistency, here and below. > + /* As for byte_scattered_write_logical but we need to take into account > + * that data written in the payload(src[0]) are now on reg_offset 1 on > SIMD8 > + * and reg_offset 2 and 3 on SIMD16. > + */ > + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && i == 0) { > + if (DIV_ROUND_UP(reg_offset, (this->exec_size / 8)) == 1) > + return periodic_mask(8, 4, this->src[2].ud / 8, 0); > + } > + /* We assume that send sources to be completelly used */ s/that// > + if (this->is_send_from_grf() || this->mlen != 0) > + return ~0u; > + > + /* The byte pattern is calculated using a periodic mask for ALU > + * operations and reg_offset in bounds. > + */ > + unsigned step = this->src[i].stride * type_sz(this->src[i].type); > + unsigned byte_offset = this->src[i].offset % REG_SIZE; > + if (reg_offset == 0 || byte_offset < step) { Weird indentation. > + return periodic_mask(this->exec_size, step, type_sz(this->src[i].type), > + byte_offset); > + } > + > + /* We don't know what was read so return ~0u as safest choice */ Weird indentation. > + return ~0u; > +} > + > unsigned > fs_inst::components_read(unsigned i) const > { > diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h > index 92dad269a34..dab776a3664 100644 > --- a/src/intel/compiler/brw_ir_fs.h > +++ b/src/intel/compiler/brw_ir_fs.h > @@ -350,6 +350,8 @@ public: > bool equals(fs_inst *inst) const; > bool is_send_from_grf() const; > bool is_partial_write() const; > + unsigned src_read_pattern(int src, unsigned reg_offset) const; > + unsigned dst_write_pattern(unsigned reg_offset) const; > bool is_copy_payload(const brw::simple_allocator &grf_alloc) const; > unsigned components_read(unsigned i) const; > unsigned size_read(int arg) const; > -- > 2.17.1
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev