On Mon, Oct 5, 2015 at 1:06 PM, Emil Velikov <emil.l.veli...@gmail.com> wrote: > On 5 October 2015 at 17:11, Connor Abbott <cwabbo...@gmail.com> wrote: >> On Mon, Oct 5, 2015 at 11:36 AM, Emil Velikov <emil.l.veli...@gmail.com> >> wrote: >>> Hi all, >>> >>> I am looking at ARB_shader_clock with i965 in mind. >>> >>> So far I've got the most of the infra/plumbing, and a fancy a new intrinsic >>> :) >>> >>> On the hardware side, I was thinking about using the Observability >>> Architecture (OA) counters. The fun part is that those tend to vary >>> quite a bit based on the hardware generation. So far I'm leaning >>> towards: >>> - "Count of XXX threads dispatched to EUs" for BRW and later. >>> - "XXX Shader Active Time" for earlier (SNB-HSW/VLV) hardware. >>> >>> Do there sound appropriate, or should we opt for the various knobs in >>> 'Flexible EU event counters' ? Is there some alternative piece of >>> hardware in i965, which I can use ? >>> >>> >>> Going for OA has a small catch. Reading through the PRM, it is not >>> obvious if one can track the same source twice (the >>> GL_AMD_performance_monitor implementation comes to mind). I'm about to >>> take a closer look into brw_performance_monitor.[ch] shortly, but if >>> any gotchas/fancy interactions come to mind let me know. >>> >>> Thanks >>> Emil >>> >>> P.S. Does anyone recall the consensus wrt adding the 2015 extensions >>> to GL3.txt ? >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> Hi Emil, >> >> I don't think you want to use the OA counters to implement >> ARB_shader_clock. They're not exposed to the shader directly, AFAIK, >> and they only measure things on a per-invocation granularity, whereas >> the intent of ARB_shader_clock is to be able to measure the number of >> cycles that individual operations take with very low latency. Instead, >> you should read from the ARF performance register -- see page 822 of >> Vol 7 ("3D Media GPGPU") of the Broadwell PRM (page 858 of the PDF) >> for more details. >> > I knew that there should be nicer piece of hardware for this, but > could not find it looking through the spec. The timestamp register > looks exactly like the thing we need there. > >> Another interesting thing is that you can atomically read from that >> register and also get a bit that say whether there was some event, >> such as a context switch, since the last time you read it that would >> make your measurement invalid. It might be useful to expose this >> through a GLSL extension as another set of overloads: >> >> uint64_t clockARB(out bool valid); //once we get int64 support >> uvec2 clock2x32ARB(out bool valid); >> > Are you thinking about writing up another extension, or should we just > wire things internally as someone else does it for us ? Would you have > any preference how to handle things when a context switch has occurred > (for the official functions) ?
I was thinking about adding a new extension; AFAIK no one else has exposed this in an extension before, but at least Intel HW has it. If by "the official functions" you mean the ones in ARB_shader_clock, then they should just read the register without worrying about whether there was a context switch or not. They should still reset the "is this valid" state though, since that's what the HW does and it makes sense for cases like the example below. > >> and a corresponding NIR intrinsic that outputs an extra component >> that's a boolean (i.e. 0 or ~0). That would help with implementing >> something like INTEL_DEBUG=shader_time generically with less outliers >> to throw away. >> > Exposing it via INTEL_DEBUG will be great, but first I'd stick getting > the extension bits in place. Oh no, I meant replacing it entirely. That is, we'd have something above Mesa or in core Mesa that inserts code like: layout(binding = 0, std430) buffer { uint time[]; }; layout(binding = 1, offset = 0) uniform atomic_uint idx; void main() { uint46_t start = clockARB(); //using uint64 for brevity even if we don't support it now ... //the original shader bool valid; uint64_t end = clockARB(valid); if (valid && end > start) { time[atomicCounterIncrement(idx)] = end - start; } } and we could rip out all the code inside i965 to implement INTEL_DEBUG=shader_time, which is fragile and often in the way of refactors. > > Thanks > Emil _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev