Re: [Mesa-dev] Perfetto CPU/GPU tracing

Rob Clark Fri, 12 Feb 2021 17:57:37 -0800

On Fri, Feb 12, 2021 at 5:40 PM John Bates <jba...@chromium.org> wrote:
>
>
>
> On Fri, Feb 12, 2021 at 4:34 PM Rob Clark <robdcl...@gmail.com> wrote:
>>
>> On Thu, Feb 11, 2021 at 5:40 PM John Bates <jba...@chromium.org> wrote:
>> >
>>
>> <snip>
>>
>> > Runtime Characteristics
>> >
>> > ~500KB additional binary size. Even with using only the basic features of 
>> > perfetto, it will increase the binary size of mesa by about 500KB.
>>
>> IMHO, that size is negligible.. looking at freedreno, a mesa build
>> *only* enabling freedreno is already ~6MB.. distros typically use
>> "megadriver" (ie. all the drivers linked into a single .so with hard
>> links for the different  ${driver}_dri.so), which on my fedora laptop
>> is ~21M.  Maybe if anything is relevant it is how much of that
>> actually gets paged into RAM from disk, but I think 500K isn't a thing
>> to worry about too much.
>>
>> > Background thread. Perfetto uses a background thread for communication 
>> > with the system tracing daemon (traced) to advertise trace data and get 
>> > notification of trace start/stop.
>>
>> Mesa already tends to have plenty of threads.. some of that depends on
>> the driver, I think currently radeonsi is the threading king, but
>> there are several other drivers working on threaded_context and async
>> compile thread pool.
>>
>> It is worth mentioning that, AFAIU, perfetto can operate in
>> self-server mode, which seems like it would be useful for distros
>> which do not have the system daemon.  I'm not sure if we lose that
>> with percetto?
>
>
> Easy to add, but want to avoid a runtime arg because it would add ~300KB to 
> binary size. Okay if we have an alternate init function though.


I think I could imagine wanting mesa build params to control whether
we want self-server or system-server mode.. ie. if some distros add
system-server support they wouldn't need self-server mode and visa
versa

>
>>
>>
>> > Runtime overhead when disabled is designed to be optimal with one 
>> > predicted branch, typically a few CPU cycles per event. While enabled, the 
>> > overhead can be around 1 us per event.
>> >
>> > Integration Challenges
>> >
>> > The perfetto SDK is C++ and designed around macros, lambdas, inline 
>> > templates, etc. There are ongoing discussions on providing an official 
>> > perfetto C API, but it is not yet clear when this will land on the 
>> > perfetto roadmap.
>> > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines 
>> > of code.
>> > Anything that includes perfetto.h takes a long time to compile.
>> > The current Perfetto SDK design is incompatible with being a shared 
>> > library behind a C API.
>>
>> So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code.
>> But maybe we should verify that MSVC is happy with it, otherwise we
>> need to take a bit more care in some parts of the codebase.
>>
>> As far as compile time, I wonder if we can regenerate the .cc/.h with
>> only the gpu trace parts?  But I wouldn't expect the .h to be
>> something widely included.  For example, for gpu timeline traces in
>> freedreno, I'm expecting it to look like a freedreno_perfetto.cc with
>> extern "C" {} around the callbacks that would hook into the
>> u_tracepoint tracepoints.  That one file would pull in the perfetto
>> .h, and we'd just not build that file if perfetto was disabled.
>
>
> That works for GPU, but I'd like to see some slow CPU functions in traces as 
> well to help reason about performance problems. This ends up peppering the 
> trace header in lots of places.

My point was that we could strip out a whole lot of stuff that is
completely unrelated to mesa.. not sure if it is worth bothering with,
I doubt we'd #include perfetto.h very widely

>> Overall having to add our own extern C wrappers in some places doesn't
>> seem like the *end* of the world.. a bit annoying, but we might end up
>> doing that regardless if other folks want the ability to hook in
>> something other than perfetto?
>
>
> It's more than extern C wrappers if we want to minimize overhead while 
> tracing enabled at compile time. Have a look at percetto.h/cc.

I'm not sure how many distros are not using LTO these days.. I assume
once you have LTO it doesn't really matter anymore?

>>
>>
>> <snip>
>>
>> > Mesa Integration Alternatives
>>
>> I'm kind of leaning towards the "just slurp in the .cc/.h" approach..
>> that is mostly because I expect to initially just add some basic gpu
>> timeline tracepoints, but over time iterate on adding more.. it would
>> be nice to not have to depend on a newer version of an external
>> library at each step.  That is ofc only my $0.02..
>
>
> It's a small initial setup tax, true, but I still think it depends on what 
> perfetto features we plan to use -- for only a couple files doing GPU tracing 
> I agree percetto is unnecessary, but for CPU tracing it gets more complicated.

Definitely the first thing I plan to use is getting render stages onto
a timeline, so I can better see where the GPU time is going.. second
step is probably adding more gpu perfcntr.. and I guess the third
thing is more CPU oriented things like seeing where shader compiles
are happening.  Although threaded_context might also be a thing where
having some more CPU tracing could be useful?

BR,
-R
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] Perfetto CPU/GPU tracing

Reply via email to