On 09/30/2012 10:08 PM, Kenneth Graunke wrote: > On 09/20/2012 07:06 PM, Paul Berry wrote: >> Callers of emit_scratch_{read,write}() pass a base_offset that is >> measured in bytes, but emit_scratch_{read,write}() was assuming it was >> measured in multiples of a complete register. As a result, a shader >> using a scratch read/write offset greater than zero (in practice, a >> shader containing more than one variable in scratch) would use too >> large an offset, frequently exceeding the available scratch space. >> >> This patch corrects the mismatch by dividing base_offset by REG_SIZE >> at the beginning of emit_scratch_{read,write}(). >> >> NOTE: This is a candidate for stable release branches. >> --- >> >> No test cases were failing because of this bug, but it was getting in >> the way of Ken's efforts to implement register spilling for vertex >> shaders. >> >> Ken: I've done a full piglit run on master to verify that it doesn't >> regress anything, but I didn't check whether there are any remaining >> regressions when I apply it to your "spilling" branch. All I checked >> was that it fixed the 3 regressions you told me about earlier today. >> >> src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 14 ++++++++++++++ >> 1 file changed, 14 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp >> index 682837f..60fab54 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp >> @@ -2443,12 +2443,19 @@ >> vec4_visitor::get_pull_constant_offset(vec4_instruction *inst, >> /** >> * Emits an instruction before @inst to load the value named by @orig_src >> * from scratch space at @base_offset to @temp. >> + * >> + * @base_offset is measured in bytes. >> */ >> void >> vec4_visitor::emit_scratch_read(vec4_instruction *inst, >> dst_reg temp, src_reg orig_src, >> int base_offset) >> { >> + /* Convert base_offset from bytes to a register number, since that's what >> + * get_scratch_offset() expects. >> + */ >> + base_offset /= REG_SIZE; >> + >> int reg_offset = base_offset + orig_src.reg_offset; >> src_reg index = get_scratch_offset(inst, orig_src.reladdr, reg_offset); >> >> @@ -2458,12 +2465,19 @@ vec4_visitor::emit_scratch_read(vec4_instruction >> *inst, >> /** >> * Emits an instruction after @inst to store the value to be written >> * to @orig_dst to scratch space at @base_offset, from @temp. >> + * >> + * @base_offset is measured in bytes. >> */ >> void >> vec4_visitor::emit_scratch_write(vec4_instruction *inst, >> src_reg temp, dst_reg orig_dst, >> int base_offset) >> { >> + /* Convert base_offset from bytes to a register number, since that's what >> + * get_scratch_offset() expects. >> + */ >> + base_offset /= REG_SIZE; >> + >> int reg_offset = base_offset + orig_dst.reg_offset; >> src_reg index = get_scratch_offset(inst, orig_dst.reladdr, reg_offset); > > Okay. I've looked at this a bit more, and I think your patch is right, > but there might be a bug pre-Gen6. > > Here's my thinking. In move_grf_array_access_to_scratch(), I see: > > c->last_scratch += this->virtual_grf_sizes[src->reg] * 8 * 4; > > This is clearly in measured in bytes: 8 * 4 is the size of a vec8 (or > simd4x2) register. This is passed to emit_scratch_read/write. > > So the base_offset argument to emit_scratch_read/write is measured in > bytes. Your patch then causes it to be divided by REG_SIZE (8*4) > (ironically undoing the earlier scaling), at which point base_offset is > again the number of simd4x2 registers, or OWords. We add reg_offset, > which is also in OWords. > > We then pass that to get_scratch_offset. Everything's dandy for Gen6+, > we want OWords. But for earlier hardware, we hit: > > /* Pre-gen6, the message header uses byte offsets instead of vec4 > * (16-byte) offset units. > */ > if (intel->gen < 6) > message_header_scale *= 16; > > At which point I say...16 bytes??? We're converting from units of > OWords/SIMD4x2 registers, not units of vec4s. So I believe this should > be 32 bytes (REG_SIZE aka 8*4). > > Thoughts?
Ooops. Paul pointed out two problems with my analysis: - An OWord is the size of a vec4, not a vec8, since a float is actually a DWord. (Grumble...terminology...) - message_header_scale is actually initalized to 2, which scales from number of whole 4x2 registers (vec8s) to OWords (vec4s). This also should correctly scale by 32 on Gen4-5. So I believe Paul's patch is correct and the Gen4-5 bug I was concerned about does not actually exist. I respun a version that just leaves the variable in number of registers, rather than multiplying by 32 to get bytes and immediately dividing again. Neither of us have a strong preference. I'll send mine out momentarily. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev