Matt Turner <matts...@gmail.com> writes: > On Mon, Mar 14, 2016 at 9:32 PM, Kenneth Graunke <kenn...@whitecape.org> > wrote: >> On Monday, March 14, 2016 5:25:58 PM PDT Matt Turner wrote: >>> On Mon, Mar 14, 2016 at 5:22 PM, Matt Turner <matts...@gmail.com> wrote: >>> > On Mon, Mar 14, 2016 at 5:10 PM, Francisco Jerez <curroje...@riseup.net> >> wrote: >>> >> Matt Turner <matts...@gmail.com> writes: >>> >> >>> >>> Missing this causes an assertion failure in the scheduler with the next >>> >>> patch. >>> >>> --- >>> >>> src/mesa/drivers/dri/i965/brw_ir_vec4.h | 3 ++- >>> >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h b/src/mesa/drivers/ >> dri/i965/brw_ir_vec4.h >>> >>> index 660beca..7cedf8e 100644 >>> >>> --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h >>> >>> +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h >>> >>> @@ -201,7 +201,8 @@ public: >>> >>> { >>> >>> return (conditional_mod && (opcode != BRW_OPCODE_SEL && >>> >>> opcode != BRW_OPCODE_IF && >>> >>> - opcode != BRW_OPCODE_WHILE)); >>> >>> + opcode != BRW_OPCODE_WHILE)) || >>> >>> + opcode == TCS_OPCODE_SRC0_010_IS_ZERO; >>> >> >>> >> Meh... Any reason this weird instruction doesn't have the >>> >> conditional_mod set on creation? Having the generator set it implicitly >>> >> makes the instruction *less* useful and is the only reason we need to >>> >> introduce these hacks. >>> > >>> > None that I know of. I'd be happy with that fix as a replacement for this >> patch. >> >> Setting it in the visitor seems totally reasonable. I like that better. >> >>> >> AFAICT the TCS_OPCODE_SRC0_010_IS_ZERO opcode is actually redundant and >>> >> equivalent to: >>> >> >>> >> broadcast.nz.f0 null.xyzw, src.xxxx, 0UD >>> > >>> > I think there's a difference -- that broadcast instruction treats the >>> > two vec4 halves independently, reading a component from each of them. >>> > The SRC0_010_IS_ZERO reads a single component of the whole vec8 (I >>> > don't see calls that would set it, but that region implies it's an >>> > align1 instruction). Does that seem right? >> >> Yeah, that's the point - we want to make both SIMD4x2 halves read the >> same 32-bit scalar value. >> >> I didn't realize that broadcast existed in the vec4 backend. I don't >> think it does that, though...? >> >>> >>> Uh, this seems weird. That apparently generates: >>> >>> mov.z.f0(8) null<1>F g5<0,4,1>.xUD { align16 1Q >> }; >>> >>> in arb_tessellation_shader/execution/nop.shader_test. >>> >>> I wonder if Ken intended that instruction to be align1? >> >> I don't recall. Does it matter? Aren't >> >> mov.z.f0(8) null<1>F g5<0,4,1>.xUD { align16 1Q }; >> >> and >> >> mov.z.f0(8) null<1>F g5<0,1,0>UD { align1 1Q }; >> >> both equivalent, anyway? I think I named the opcode "010" because >> that's effectively what it does, even if I did it in align16 mode. > > Yes, they're equivalent. I was just confused by setting stride(src[0], > 0, 1, 0) in an instruction that isn't capable of a <0,1,0> stride. > Actually, I still don't understand how that produces what it does.
I guess it works by luck, brw_eu_emit.c (hopefully always) ignores the width and horizontal stride in Align16 mode so it ends up emitting the correct region <0,4,1>. We should probably be setting the source region to the correct value from the start.
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev