On Sun, 2010-10-31 at 08:10 +0000, Chris Wilson wrote: > On Sun, 31 Oct 2010 02:24:06 +0000, Peter Clifton <pc...@cam.ac.uk> wrote: > > Hi guys, > > > > I thought I'd attach this, as it is now gone 2AM and I doubt I'm going > > to finish it "tonight". I was hoping to elicit some initial review to > > suggest whether the design was sane or not. > > Been here, done something similar and ripped it out.
Doh.. and I was feeling pleased with myself there ;) It took me nearly a day, but it was the first serious kernel driver work I've done... I had to look up a lot of APIs! FWIW, the patch (or the version I have here which uses double-buffering for the samples), does produce vaguely useful looking data, but I'm not sure I can get the hrtimer to fire fast enough. Spinning a separate scheduled process would be one (nasty) way to go. > What we want to do is > integrate an additional set of sampling with perf. The last time I looked, > it required a few extra lines to perf-core to allow devices to register > their own counters, and then you get the advantage of a reasonable > interface (plus the integration with CPU profiling and timecharts etc). Sounds good. I had wondered about integration with tracing (hence the name of my code), so it could "somehow" be tied in with something like sysprof output. I'm as yet undecided whether to attempt to stream events / data fast enough to capture live 0/1 data on whether a unit is busy or not, or whether we want to take a burst of samples periodically, and come up with a %busy figure within the kernel at each time step, e.g. | | | | | | | | | | | | | | | | | <-- sampling raw data 0 1 1 0 0 1 0 1 0 1 1 1 0 0 0 1 1 OR ||||| ||||| ||||| ||||| <-- sampling bursts 20% 50% 45% 20% Somehow the latter seems like it might help IO down to the userspace app, but it does artificially blur the lines between units, and stops you seeing exactly when units synchronise being busy / idle etc. It would probably still produce pictures along the lines of the ones I posted to the list, which I think looked useful. [snip] > You can use the current trace points to get timings for > submit + complete + retire. What's missing here is being able to mark > individual batch buffers for profiling. I think adding a new TIMING_FENCE > ioctl (it could just be a fence ;-) that capture various stats at the > point of submission and completion and then fired off an event (to be read > on the /dev/dri/card0 fd) would be the more flexible solution. Stupid question.. what do you mean by "fence". I vaguely understand the term for URB allocation boundaries, for tiling boundaries (I think).. Do you mean noting down some ring buffer sequence ids which we can pick up when they start / complete to enable tracing? Could you just add a flag to the exec batchbuffer IOCTL to enable tracing, or do you want to pass more information to control how things are traced? On the one hand, it would be really fun to see how individual batchbuffers utilise the GPU, but in reality, there are multiple batchbuffers for a given rendering frame in some workloads. I assume it is completely possible for some other client to slip in a batchbuffer in between my app's batch-buffers, so you'd really want to see that (and see who it belongs to) in order to explain the resulting timings. At present, I'm uncertain whether the perf read-out needs to be from within the app we're trying to profile (so we know which batchbuffers belong to it), or whether it works best as some external application. With the latter we can either choose to look at system wide GPU usage, but if we wanted to target down onto a frame of rendering from a particular application - we'd need some means to identify the appropriate batches / times. I'd love to see a real time (not post-processed) graph of frame timing for the GPU, but it amused me to realise the app would have to be pretty self-aware of its own rendering usage to avoid profiling its own graph drawing routines. -- Peter Clifton Electrical Engineering Division, Engineering Department, University of Cambridge, 9, JJ Thomson Avenue, Cambridge CB3 0FA Tel: +44 (0)7729 980173 - (No signal in the lab!) Tel: +44 (0)1223 748328 - (Shared lab phone, ask for me) _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx