These new methods return for a instruction register source/destination the read/write byte pattern of the 32-byte GRF as an unsigned int.
The returned pattern takes into account the exec_size of the instruction, the type bitsize, the register stride and a relative offset inside the register. The motivation of this functions if to know the read/written bytes of the instructions to improve the liveness analysis for partial read/writes. We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the bitsize parameter they have a different read pattern. v2: (Francisco Jerez) - Split original register_byte_use_pattern into one read and other write. - Check for send like instructions using this->mlen != 0 - Pass functions src number and offset. - Use periodic_mask function with code written by Francisco Jerez to simplify pattern generation. - Avoid breaking silently if source straddles multiple GRFs. v3: (Francisco Jerez) - A SEND could be this->mlen != 0 or this->is_send_from_grf - We only assume that a periodic mask with offset could be applied to reg_offset == 0. - We can assure that for MOVs operations for any offset (Chema) Cc: Francisco Jerez <curroje...@riseup.net> --- src/intel/compiler/brw_fs.cpp | 119 +++++++++++++++++++++++++++++++++ src/intel/compiler/brw_ir_fs.h | 2 + 2 files changed, 121 insertions(+) diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp index 7ddbd285fe2..4fa0f154c44 100644 --- a/src/intel/compiler/brw_fs.cpp +++ b/src/intel/compiler/brw_fs.cpp @@ -39,6 +39,7 @@ #include "compiler/glsl_types.h" #include "compiler/nir/nir_builder.h" #include "program/prog_parameter.h" +#include <limits.h> using namespace brw; @@ -687,6 +688,124 @@ fs_inst::is_partial_write() const this->dst.offset % REG_SIZE != 0); } +/** + * Returns a periodic mask that is repeated "count" times with a "step" + * size and consecutive "bits" finally shifted "offset" bits to the left. + * + * This helper is used to calculate the representations of byte read/write + * register patterns + * + * Example: periodic_mask(8, 4, 2, 0) would return 0x33333333 + * periodic_mask(8, 4, 2, 2) would return 0xcccccccc + * periodic_masc(8, 2, 2, 16) would return 0xffff0000 + */ +static inline uint32_t +periodic_mask(unsigned count, unsigned step, unsigned bits, unsigned offset) +{ + uint32_t m = (count ? (1 << bits) - 1 : 0); + const unsigned max = MIN2(count * step, sizeof(m) * CHAR_BIT); + + for (unsigned shift = step; shift < max; shift *= 2) + m |= m << shift; + + return m << offset; +} + +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been written by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and the + * stride of the destination register. + * + * The objective of this function is to identify which parts of the register + * are defined for operations that don't write a full register. So we + * we can identify in live range variable analysis if a partial write has + * completelly defined the data used by a partial read. + */ +unsigned +fs_inst::dst_write_pattern(unsigned reg_offset) const +{ + assert(this->dst.file == VGRF); + /* We don't know what is written so we return the worst case */ + if (this->predicate && this->opcode != BRW_OPCODE_SEL) + return 0u; + /* We assume that send destinations are completelly defined */ + if (this->is_send_from_grf() || this->mlen != 0) { + return ~0u; + } + + /* The byte pattern is calculated using a periodic mask for reg_offset == 0 + * because the internal offset will match how the register is written. + * + * We can for any reg_offset on MOV operations. We could add in the future + * other opcodes, but we didn't include them until we have evidences of + * them being used in partial write situations that ensure that the pattern + * is repeated of any reg_offset. + */ + if (reg_offset == 0 || this->opcode == BRW_OPCODE_MOV) { + return periodic_mask(this->exec_size, + this->dst.stride * type_sz(this->dst.type), + type_sz(this->dst.type), + this->dst.offset % REG_SIZE); + } + /* This shouldn't be reached by in liveness range calcluation but if + * function is other context we know that we write a complete register. + */ + if (!this->is_partial_write()) + return ~0u; + + /* By default we don't know what is written */ + return 0u; +} + +/** + * Returns a 32-bit uint whose bits represent if the associated register byte + * has been read by the instruction. The returned pattern takes into + * account the exec_size of the instruction, the type bitsize and stride of + * a source register and a register offset. + * + * The objective of this function is to identify which parts of the register + * are used for operations that don't read a full register. + */ +unsigned +fs_inst::src_read_pattern(int i, unsigned reg_offset) const +{ + assert(src[i].file == VGRF); + /* byte_scattered_write_logical pattern of src[1] is 32-bit aligned + * so the read pattern depends on the bitsize stored at src[4]. + */ + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL && i == 1) + return periodic_mask(8, 4, this->src[4].ud / 8, 0); + /* As for byte_scattered_write_logical but we need to take into account + * that data written in the payload(src[0]) are now on reg_offset 1 on SIMD8 + * and reg_offset 2 and 3 on SIMD16. + */ + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && i == 0) { + if (DIV_ROUND_UP(reg_offset, (this->exec_size / 8)) == 1) + return periodic_mask(8, 4, this->src[2].ud / 8, 0); + } + /* We assume that send sources could be completelly used */ + if (this->is_send_from_grf() || this->mlen != 0) + return ~0u; + + /* The byte pattern is calculated using a periodic mask for reg_offset == 0 + * because the internal offset will match how the register is read. + * + * We can for any reg_offset on MOV operations. We could add in the future + * other opcodes, but we didn't include them until we have evidences of + * them being used in partial read situations that ensure that the pattern + * is repeated of any reg_offset. + */ + if (!reg_offset || this->opcode == BRW_OPCODE_MOV) { + return periodic_mask(this->exec_size, + this->src[i].stride * type_sz(this->src[i].type), + type_sz(this->src[i].type), + this->src[i].offset % REG_SIZE); + } + /* By default we assume that any byte could be read */ + return ~0u; +} + unsigned fs_inst::components_read(unsigned i) const { diff --git a/src/intel/compiler/brw_ir_fs.h b/src/intel/compiler/brw_ir_fs.h index 92dad269a34..dab776a3664 100644 --- a/src/intel/compiler/brw_ir_fs.h +++ b/src/intel/compiler/brw_ir_fs.h @@ -350,6 +350,8 @@ public: bool equals(fs_inst *inst) const; bool is_send_from_grf() const; bool is_partial_write() const; + unsigned src_read_pattern(int src, unsigned reg_offset) const; + unsigned dst_write_pattern(unsigned reg_offset) const; bool is_copy_payload(const brw::simple_allocator &grf_alloc) const; unsigned components_read(unsigned i) const; unsigned size_read(int arg) const; -- 2.17.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev