Re: [Intel-gfx] Design of a GPU profiling debug interface

Jesse Barnes Tue, 09 Nov 2010 09:16:09 -0800

On Sat, 30 Oct 2010 14:04:11 +0100
Peter Clifton <pc...@cam.ac.uk> wrote:


> I think I'll need some help with this. I'm by no means a kernel
> programmer, so I'm feeling my way in the dark with this.
> 
> I want to design an interface so I can synchronise my GPU idle flags
> polling with batchbuffer execution. I'm imagining at a high level, doing
> something like this in my application (or mesa). (Hand-wavey-pseudocode)
> 
> expose_event_handler ()
> {
>       static bool one_shot_trace = true;
> 
>       if (one_shot_trace)
>               mesa_debug_i915_trace_idle (TRUE);
> 
>       /* RENDERING COMMANDS IN HERE */
>       SwapBuffers();
> 
>       if (one_shot_trace)
>               mesa_debug_i915_trace_idle (FALSE);
> 
>       one_shot_trace = false;
> }
> 
> 
> I was imagining adding a flag to the EXECBUFFER2 IOCTL, or perhaps
> adding a new EXECBUFFER3 IOCTL (which I'm playing with locally now).
> Basically I just want to flag execbuffers which I'm interested in seeing
> profiling data for.
> 
> In order to get really high-resolution profiling, it would be
> advantageous to confine it to the time-period of interest otherwise the
> data rate is too high. I guestimated about 10MB/s for a binary
> representation of the data I'm currently polling in user-space. More
> spatial resolution would be nice too, so this could increase.

Would be very cool to be able to correlate the data...

> I think I have a vague idea how to do the GPU and logging parts, even if
> I end up having to start the polling before the batchbuffer starts
> executing.
> 
> What I've got little / no clue how to is manage allocation of memory to
> store the results in.
> 
> Should userspace (mesa?) be passing buffers for the kernel to return
> profiling data? Then retrieving it somehow when it "knows" the
> batchbuffer is finished? This will probably require over-allocation with
> a guestimate of required memory space to log the given batch-buffer.
> 
> What about exporting via debugfs. Assuming the above code-fragment, we
> could leave the last "frame" of polled data available, with the data
> being overwritten when the next request to start logging comes in.
> (That would perhaps require some kind of sequence number assigned if we
> have multiple batches which come under the same request... or a separate
> IOCTL to turn on / off logging).

There's also relayfs, which is made for high bandwidth kernel->user
communication.  I'm not sure if it will make this any easier, but I
think there's some documentation in the kernel about it.

A ring buffer with the last N timestamps might also be a good way of
exposing things.  Having more than one entry available means that if
userspace didn't get scheduled at the right time it would still have a
good chance of getting all the data it missed since the last read.

> 
> Also.. I'm not sure how the locking would work if userspace is reading
> out the debugfs file whilst another frame is being executed. (We'd
> probably need a secondary logging buffer allocating in that case).

The kernel implementation of the read() side of the file could do some
locking to prevent new data from corrupting a read in progress.

-- 
Jesse Barnes, Intel Open Source Technology Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] Design of a GPU profiling debug interface

Reply via email to