On Fri, Feb 12, 2021 at 5:40 PM John Bates <jba...@chromium.org> wrote: > > > > On Fri, Feb 12, 2021 at 4:34 PM Rob Clark <robdcl...@gmail.com> wrote: >> >> On Thu, Feb 11, 2021 at 5:40 PM John Bates <jba...@chromium.org> wrote: >> > >> >> <snip> >> >> > Runtime Characteristics >> > >> > ~500KB additional binary size. Even with using only the basic features of >> > perfetto, it will increase the binary size of mesa by about 500KB. >> >> IMHO, that size is negligible.. looking at freedreno, a mesa build >> *only* enabling freedreno is already ~6MB.. distros typically use >> "megadriver" (ie. all the drivers linked into a single .so with hard >> links for the different ${driver}_dri.so), which on my fedora laptop >> is ~21M. Maybe if anything is relevant it is how much of that >> actually gets paged into RAM from disk, but I think 500K isn't a thing >> to worry about too much. >> >> > Background thread. Perfetto uses a background thread for communication >> > with the system tracing daemon (traced) to advertise trace data and get >> > notification of trace start/stop. >> >> Mesa already tends to have plenty of threads.. some of that depends on >> the driver, I think currently radeonsi is the threading king, but >> there are several other drivers working on threaded_context and async >> compile thread pool. >> >> It is worth mentioning that, AFAIU, perfetto can operate in >> self-server mode, which seems like it would be useful for distros >> which do not have the system daemon. I'm not sure if we lose that >> with percetto? > > > Easy to add, but want to avoid a runtime arg because it would add ~300KB to > binary size. Okay if we have an alternate init function though.
I think I could imagine wanting mesa build params to control whether we want self-server or system-server mode.. ie. if some distros add system-server support they wouldn't need self-server mode and visa versa > >> >> >> > Runtime overhead when disabled is designed to be optimal with one >> > predicted branch, typically a few CPU cycles per event. While enabled, the >> > overhead can be around 1 us per event. >> > >> > Integration Challenges >> > >> > The perfetto SDK is C++ and designed around macros, lambdas, inline >> > templates, etc. There are ongoing discussions on providing an official >> > perfetto C API, but it is not yet clear when this will land on the >> > perfetto roadmap. >> > The perfetto SDK is an amalgamated .h and .cc that adds up to 100K lines >> > of code. >> > Anything that includes perfetto.h takes a long time to compile. >> > The current Perfetto SDK design is incompatible with being a shared >> > library behind a C API. >> >> So, C++ on it's own isn't a showstopper, mesa has plenty of C++ code. >> But maybe we should verify that MSVC is happy with it, otherwise we >> need to take a bit more care in some parts of the codebase. >> >> As far as compile time, I wonder if we can regenerate the .cc/.h with >> only the gpu trace parts? But I wouldn't expect the .h to be >> something widely included. For example, for gpu timeline traces in >> freedreno, I'm expecting it to look like a freedreno_perfetto.cc with >> extern "C" {} around the callbacks that would hook into the >> u_tracepoint tracepoints. That one file would pull in the perfetto >> .h, and we'd just not build that file if perfetto was disabled. > > > That works for GPU, but I'd like to see some slow CPU functions in traces as > well to help reason about performance problems. This ends up peppering the > trace header in lots of places. My point was that we could strip out a whole lot of stuff that is completely unrelated to mesa.. not sure if it is worth bothering with, I doubt we'd #include perfetto.h very widely >> Overall having to add our own extern C wrappers in some places doesn't >> seem like the *end* of the world.. a bit annoying, but we might end up >> doing that regardless if other folks want the ability to hook in >> something other than perfetto? > > > It's more than extern C wrappers if we want to minimize overhead while > tracing enabled at compile time. Have a look at percetto.h/cc. I'm not sure how many distros are not using LTO these days.. I assume once you have LTO it doesn't really matter anymore? >> >> >> <snip> >> >> > Mesa Integration Alternatives >> >> I'm kind of leaning towards the "just slurp in the .cc/.h" approach.. >> that is mostly because I expect to initially just add some basic gpu >> timeline tracepoints, but over time iterate on adding more.. it would >> be nice to not have to depend on a newer version of an external >> library at each step. That is ofc only my $0.02.. > > > It's a small initial setup tax, true, but I still think it depends on what > perfetto features we plan to use -- for only a couple files doing GPU tracing > I agree percetto is unnecessary, but for CPU tracing it gets more complicated. Definitely the first thing I plan to use is getting render stages onto a timeline, so I can better see where the GPU time is going.. second step is probably adding more gpu perfcntr.. and I guess the third thing is more CPU oriented things like seeing where shader compiles are happening. Although threaded_context might also be a thing where having some more CPU tracing could be useful? BR, -R _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev