On Fri, Nov 1, 2013 at 10:31 AM, Paul Berry <stereotype...@gmail.com> wrote: > On 29 October 2013 16:37, Francisco Jerez <curroje...@riseup.net> wrote: >> >> The latency information has been obtained empirically from >> measurements taken on Haswell and Ivy Bridge. >> --- >> .../drivers/dri/i965/brw_schedule_instructions.cpp | 41 >> ++++++++++++++++++++++ >> 1 file changed, 41 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp >> b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp >> index 944b5c8..cbfaabe 100644 >> --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp >> @@ -329,6 +329,47 @@ schedule_node::set_latency_gen7(bool is_haswell) >> latency = 200; >> break; >> >> + case SHADER_OPCODE_UNTYPED_ATOMIC: >> + /* Test code: >> + * mov(8) g112<1>ud 0x00000000ud { align1 WE_all >> 1Q }; >> + * mov(1) g112.7<1>ud g1.7<0,1,0>ud { align1 WE_all >> }; >> + * mov(8) g113<1>ud 0x00000000ud { align1 >> WE_normal 1Q }; >> + * send(8) g4<1>ud g112<8,8,1>ud >> + * data (38, 5, 6) mlen 2 rlen 1 { align1 >> WE_normal 1Q }; >> + * >> + * Running it 100 times as fragment shader on a 128x128 quad >> + * gives an average latency of 13867 cycles per atomic op, >> + * standard deviation 3%. Note that this is a rather >> + * pessimistic estimate, the actual latency in cases with few >> + * collisions between threads and favorable pipelining has been >> + * seen to be reduced by a factor of 100. >> + */ >> + latency = 14000; > > > Wow, that's a really huge latency. Given your argument in the comment, I > suspect that in practice, shaders that use atomic counters are going to be a > lot closer to the "few collisions between threads and favorable pipelining" > case than they are going to be to this pessimistic estimate. Personally, > I'd be inclined to make the latency the same as > SHADER_OPCODE_UNTYPED_SURFACE_READ. > > But I'm not an expert on scheduling latencies so I'll defer to Eric and > Matt. Consider this patch:
That seems reasonable to me. Once the latency is an order of magnitude more than any other instruction, it kind of stops mattering for scheduling purposes. Either way: Reviewed-by: Matt Turner <matts...@gmail.com> _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev