On 04/04/2019 16:20, Boris Brezillon wrote: > Hello, > > This patch adds new ioctls to expose GPU counters to userspace. > These will be used by the mesa driver (should be posted soon). > > A few words about the implementation: I followed the VC4/Etnaviv model > where perf counters are retrieved on a per-job basis. This allows one > to have get accurate results when there are users using the GPU > concurrently. > AFAICT, the mali kbase is using a different approach where several > users can register a performance monitor but with no way to have fined > grained control over what job/GPU-context to track.
mali_kbase submits overlapping jobs. The jobs on slot 0 and slot 1 can be from different contexts (address spaces), and mali_kbase also fully uses the _NEXT registers. So there can be a job from one context executing on slot 0 and a job from a different context waiting in the _NEXT registers. (And the same for slot 1). This means that there's no (visible) gap between the first job finishing and the second job starting. Early versions of the driver even had a throttle to avoid interrupt storms (see JOB_IRQ_THROTTLE) which would further delay the IRQ - but thankfully that's gone. The upshot is that it's basically impossible to measure "per-job" counters when running at full speed. Because multiple jobs are running and the driver doesn't actually know when one ends and the next starts. Since one of the primary use cases is to draw pretty graphs of the system load [1], this "per-job" information isn't all that relevant (and minimal performance overhead is important). And if you want to monitor just one application it is usually easiest to ensure that it is the only thing running. [1] https://developer.arm.com/tools-and-software/embedded/arm-development-studio/components/streamline-performance-analyzer > This design choice comes at a cost: every time the perfmon context > changes (the perfmon context is the list of currently active > perfmons), the driver has to add a fence to prevent new jobs from > corrupting counters that will be dumped by previous jobs. > > Let me know if that's an issue and if you think we should approach > things differently. It depends what you expect to do with the counters. Per-job counters are certainly useful sometimes. But serialising all jobs can mess up the thing you are trying to measure the performance of. Steve > Regards, > > Boris > > Boris Brezillon (3): > drm/panfrost: Move gpu_{write,read}() macros to panfrost_regs.h > drm/panfrost: Expose HW counters to userspace > panfrost/drm: Define T860 perf counters > > drivers/gpu/drm/panfrost/Makefile | 3 +- > drivers/gpu/drm/panfrost/panfrost_device.c | 8 + > drivers/gpu/drm/panfrost/panfrost_device.h | 11 + > drivers/gpu/drm/panfrost/panfrost_drv.c | 22 +- > drivers/gpu/drm/panfrost/panfrost_gpu.c | 46 +- > drivers/gpu/drm/panfrost/panfrost_job.c | 24 + > drivers/gpu/drm/panfrost/panfrost_job.h | 4 + > drivers/gpu/drm/panfrost/panfrost_perfcnt.c | 954 ++++++++++++++++++++ > drivers/gpu/drm/panfrost/panfrost_perfcnt.h | 59 ++ > drivers/gpu/drm/panfrost/panfrost_regs.h | 22 + > include/uapi/drm/panfrost_drm.h | 122 +++ > 11 files changed, 1268 insertions(+), 7 deletions(-) > create mode 100644 drivers/gpu/drm/panfrost/panfrost_perfcnt.c > create mode 100644 drivers/gpu/drm/panfrost/panfrost_perfcnt.h > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel