Kenneth Graunke <kenn...@whitecape.org> writes: > On 03/20/2013 05:37 PM, Eric Anholt wrote: >> This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on >> my GM45 forced to load all uniforms through the varying-index path), but we >> get a whole vec4 at a time to reuse in the next commit. >> >> NOTE: This is a candidate for the 9.1 branch. >> --- >> src/mesa/drivers/dri/i965/brw_fs.cpp | 92 >> ++++++++++++++--------------- >> src/mesa/drivers/dri/i965/brw_fs.h | 3 +- >> src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 1 + >> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 55 +++++++++++------ >> 4 files changed, 84 insertions(+), 67 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp >> b/src/mesa/drivers/dri/i965/brw_fs.cpp >> index 5d83e50..e504e3a 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp >> @@ -238,57 +238,53 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(fs_reg dst, >> fs_reg surf_index, >> exec_list instructions; >> fs_inst *inst; >> >> - if (intel->gen >= 7) { >> - /* We have our constant surface use a pitch of 4 bytes, so our index >> can >> - * be any component of a vector, and then we load 4 contiguous >> - * components starting from that. >> - * >> - * We break down the const_offset to a portion added to the variable >> - * offset and a portion done using reg_offset, which means that if you >> - * have GLSL using something like "uniform vec4 a[20]; gl_FragColor = >> - * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, >> and >> - * CSE can later notice that those loads are all the same and >> eliminate >> - * the redundant ones. >> - */ >> - fs_reg vec4_offset = fs_reg(this, glsl_type::int_type); >> - instructions.push_tail(ADD(vec4_offset, >> - varying_offset, const_offset & ~3)); >> - >> - fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4), dst.type); >> - inst = new(mem_ctx) fs_inst(FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7, >> - vec4_result, surf_index, vec4_offset); >> - inst->regs_written = 4; >> - instructions.push_tail(inst); >> - >> - vec4_result.reg_offset += const_offset & 3; >> - instructions.push_tail(MOV(dst, vec4_result)); >> - } else { >> - fs_reg offset = fs_reg(this, glsl_type::uint_type); >> - instructions.push_tail(ADD(offset, varying_offset, >> fs_reg(const_offset))); >> - >> - int base_mrf = 13; >> - bool header_present = true; >> - >> - fs_reg mrf = fs_reg(MRF, base_mrf + header_present); >> - mrf.type = BRW_REGISTER_TYPE_D; >> - >> - /* On gen6+ we want the dword offset passed in, but on gen4/5 we need >> a >> - * dword-aligned byte offset. >> + /* We have our constant surface use a pitch of 4 bytes, so our index can >> + * be any component of a vector, and then we load 4 contiguous >> + * components starting from that. >> + * >> + * We break down the const_offset to a portion added to the variable >> + * offset and a portion done using reg_offset, which means that if you >> + * have GLSL using something like "uniform vec4 a[20]; gl_FragColor = >> + * a[i]", we'll temporarily generate 4 vec4 loads from offset i * 4, and >> + * CSE can later notice that those loads are all the same and eliminate >> + * the redundant ones. >> + */ >> + fs_reg vec4_offset = fs_reg(this, glsl_type::int_type); >> + instructions.push_tail(ADD(vec4_offset, >> + varying_offset, const_offset & ~3)); >> + >> + int scale = 1; >> + if (intel->gen == 4 && dispatch_width == 8) { >> + /* Pre-gen5, we can either use a SIMD8 message that requires (header, >> + * u, v, r) as parameters, or we can just use the SIMD16 message >> + * consisting of (header, u). We choose the second, at the cost of a >> + * longer return length. > > Did you try just using the SIMD8 message and not bothering to set up the > v and r components? Supposedly, they're ignored based on surface type. > Although, there's an errata for SURFTYPE_1D which says 'v' needs to > exist, which is concerning. But then again this is SURFTYPE_BUFFER... > > Then again, it's pre-Ironlake only, so I'm not super concerned...
Yeah, it's things like those errata that I was worried about. >> + enum opcode op; >> + if (intel->gen >= 7) >> + op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD_GEN7; >> + else >> + op = FS_OPCODE_VARYING_PULL_CONSTANT_LOAD; >> + fs_reg vec4_result = fs_reg(GRF, virtual_grf_alloc(4 * scale), dst.type); >> + inst = new(mem_ctx) fs_inst(op, vec4_result, surf_index, vec4_offset); >> + inst->regs_written = 4 * scale; >> + instructions.push_tail(inst); >> + >> + if (intel->gen < 7) { >> + inst->base_mrf = 13; >> + inst->header_present = true; > > Hmm? Why not go headerless on Gen5-6? Oh. Yeah, I'll try to remember to fix that later. >> + if (intel->gen >= 5) >> + msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD; >> + else { >> + /* We always use the SIMD16 message so that we only have to load U, >> and >> + * not V/R/LOD. > > "not V or R." The SIMD8 message doesn't actually have an LOD parameter > according to the G45 PRM. Fixed, thanks.
pgpOqROBLL63W.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev