On Fri, Feb 12, 2021 at 5:56 PM Lionel Landwerlin <lionel.g.landwer...@intel.com> wrote: > > On 13/02/2021 03:38, Rob Clark wrote: > > On Fri, Feb 12, 2021 at 5:08 PM Lionel Landwerlin > > <lionel.g.landwer...@intel.com> wrote: > >> We're kind of in the same boat for Intel. > >> > >> Access to GPU perf counters is exclusive to a single process if you want > >> to build a timeline of the work (because preemption etc...). > > ugg, does that mean extensions like AMD_performance_monitor doesn't > > actually work on intel? > > > It work,s but only a single app can use it at a time. >
I see.. on the freedreno side we haven't really gone down the preemption route yet, but we have a way to hook in some safe/restore cmdstream > > > > >> The best information we could add from mesa would a timestamp of when a > >> particular drawcall started. > >> But that's pretty much when timestamps queries are. > >> > >> Were you thinking of particular GPU generated data you don't get from > >> gfx-pps? > > >From the looks of it, currently I don't get *any* GPU generated data > > from gfx-pps ;-) > > > Maybe file a bug? : > https://gitlab.freedesktop.org/Fahien/gfx-pps/-/blob/master/src/gpu/intel/intel_driver.cc > > > > > > We can ofc sample counters from a separate process as well... I have a > > curses tool (fdperf) which does this.. but running outside of gpu > > cmdstream plus counters losing context across suspend/resume makes it > > less than perfect. > > > Our counters are global so to give per application values, we need to > post process a stream of HW counter snapshots. > > > > And something that works the same way as > > AMD_performance_monitor under the hook gives a more precise look at > > which shaders (for ex) are consuming the most cycles. > > > In our implementation that precision (in particular when a drawcall > ends) comes at a stalling cost unfortunately. yeah, stalling on our end too for per-draw counter snapshots.. but if you are looking for which shaders to optimize that doesn't matter *that* much.. they'll be some overhead, but it's not really going to change which draws/shaders are expensive.. just mean that you lose out on pipelining of the state changes BR, -R > > > For cases where > > we can profile a trace, frameretrace and related tools is pretty > > great.. but it would be nice to have similar visibility for actual > > games (which for me, mostly means android games, since so far no > > aarch64 steam store), but also give game developers good tools (or at > > least the same tools that they get with other closed src drivers on > > android). > > > Sure, but frame analysis is different than live monitoring of the system. > > On Intel's HW you don't get the same level of details in both cases, and > apart for a few timestamps, I think gfx-pps is as good as you gonna get > for live stuff. > > > -Lionel > > > > > > BR, > > -R > > > >> Thanks, > >> > >> -Lionel > >> > >> > >> On 13/02/2021 00:12, Alyssa Rosenzweig wrote: > >>> My 2c for Mali/Panfrost -- > >>> > >>> For us, capturing GPU perf counters is orthogonal to rendering. It's > >>> expected (e.g. with Arm's tools) to do this from a separate process. > >>> Neither Mesa nor the DDK should require custom instrumentation for the > >>> low-level data. Fahien's gfx-pps handles this correctly for Panfrost + > >>> Perfetto as it is. So for us I don't see the value in modifying Mesa for > >>> tracing. > >>> > >>> On Fri, Feb 12, 2021 at 01:34:51PM -0800, John Bates wrote: > >>>> (responding from correct address this time) > >>>> > >>>> On Fri, Feb 12, 2021 at 12:03 PM Mark Janes <mark.a.ja...@intel.com> > >>>> wrote: > >>>> > >>>>> I've recently been using GPUVis to look at trace events. On Intel > >>>>> platforms, GPUVis incorporates ftrace events from the i915 driver, > >>>>> performance metrics from igt-gpu-tools, and userspace ftrace markers > >>>>> that I locally hack up in Mesa. > >>>>> > >>>> GPUVis is great. I would love to see that data combined with > >>>> userspace events without any need for local hacks. Perfetto provides > >>>> on-demand trace events with lower overhead compared to ftrace, so for > >>>> example it is acceptable to have production trace instrumentation that > >>>> can > >>>> be captured without dev builds. To do that with ftrace it may require a > >>>> way > >>>> to enable and disable the ftrace file writes to avoid the overhead when > >>>> tracing is not in use. This is what Android does with systrace/atrace, > >>>> for > >>>> example, it uses Binder to notify processes about trace sessions. > >>>> Perfetto > >>>> does that in a more portable way. > >>>> > >>>> > >>>>> It is very easy to compile the GPUVis UI. Userspace instrumentation > >>>>> requires a single C/C++ header. You don't have to access an external > >>>>> web service to analyze trace data (a big no-no for devs working on > >>>>> preproduction hardware). > >>>>> > >>>>> Is it possible to build and run the Perfetto UI locally? > >>>> Yes, local UI builds are possible > >>>> <https://github.com/google/perfetto/blob/5ff758df67da94d17734c2e70eb6738c4902953e/ui/README.md>. > >>>> Also confirmed with the perfetto team <https://discord.gg/35ShE3A> that > >>>> trace data is not uploaded unless you use the 'share' feature. > >>>> > >>>> > >>>>> Can it display > >>>>> arbitrary trace events that are written to > >>>>> /sys/kernel/tracing/trace_marker ? > >>>> Yes, I believe it does support that via linux.ftrace data source > >>>> <https://perfetto.dev/docs/quickstart/linux-tracing>. We use that for > >>>> example to overlay CPU sched data to show what process is on each core > >>>> throughout the timeline. There are many ftrace event types > >>>> <https://github.com/google/perfetto/tree/5ff758df67da94d17734c2e70eb6738c4902953e/protos/perfetto/trace/ftrace> > >>>> in > >>>> the perfetto protos. > >>>> > >>>> > >>>>> Can it be extended to show i915 and > >>>>> i915-perf-recorder events? > >>>>> > >>>> It can be extended to consume custom data sources. One way this is done > >>>> is > >>>> via a bridge daemon, such as traced_probes which is responsible for > >>>> capturing data from ftrace and /proc during a trace session and sending > >>>> it > >>>> to traced. traced is the main perfetto tracing daemon that notifies all > >>>> trace data sources to start/stop tracing and communicates with user > >>>> tracing > >>>> requests via the 'perfetto' command. > >>>> > >>>> > >>>> > >>>>> John Bates <jba...@chromium.org> writes: > >>>>> > >>>>>> I recently opened issue 4262 > >>>>>> <https://gitlab.freedesktop.org/mesa/mesa/-/issues/4262> to begin the > >>>>>> discussion on integrating perfetto into mesa. > >>>>>> > >>>>>> *Background* > >>>>>> > >>>>>> System-wide tracing is an invaluable tool for developers to find and > >>>>>> fix > >>>>>> performance problems. The perfetto project enables a combined view of > >>>>> trace > >>>>>> data from kernel ftrace, GPU driver and various manually-instrumented > >>>>>> tracepoints throughout the application and system. This helps > >>>>>> developers > >>>>>> quickly answer questions like: > >>>>>> > >>>>>> - How long are frames taking? > >>>>>> - What caused a particular frame drop? > >>>>>> - Is it CPU bound or GPU bound? > >>>>>> - Did a CPU core frequency drop cause something to go slower than > >>>>> usual? > >>>>>> - Is something else running that is stealing CPU or GPU time? > >>>>>> Could I > >>>>>> fix that with better thread/context priorities? > >>>>>> - Are all CPU cores being used effectively? Do I need > >>>>> sched_setaffinity > >>>>>> to keep my thread on a big or little core? > >>>>>> - What’s the latency between CPU frame submit and GPU start? > >>>>>> > >>>>>> *What Does Mesa + Perfetto Provide?* > >>>>>> > >>>>>> Mesa is in a unique position to produce GPU trace data for several GPU > >>>>>> vendors without requiring the developer to build and install additional > >>>>>> tools like gfx-pps <https://gitlab.freedesktop.org/Fahien/gfx-pps>. > >>>>>> > >>>>>> The key is making it easy for developers to use. Ideally, perfetto is > >>>>>> eventually available by default in mesa so that if your system has > >>>>> perfetto > >>>>>> traced running, you just need to run perfetto (perhaps along with > >>>>>> setting > >>>>>> an environment variable) with the mesa categories to see: > >>>>>> > >>>>>> - GPU processing timeline events. > >>>>>> - GPU counters. > >>>>>> - CPU events for potentially slow functions in mesa like shader > >>>>> compiles. > >>>>>> Example of what this data might look like (with fake GPU events): > >>>>>> [image: percetto-gpu-example.png] > >>>>>> > >>>>>> *Runtime Characteristics* > >>>>>> > >>>>>> - ~500KB additional binary size. Even with using only the basic > >>>>> features > >>>>>> of perfetto, it will increase the binary size of mesa by about > >>>>>> 500KB. > >>>>>> - Background thread. Perfetto uses a background thread for > >>>>> communication > >>>>>> with the system tracing daemon (traced) to advertise trace data > >>>>>> and > >>>>> get > >>>>>> notification of trace start/stop. > >>>>>> - Runtime overhead when disabled is designed to be optimal with > >>>>>> one > >>>>>> predicted branch, typically a few CPU cycles > >>>>>> > >>>>>> <https://perfetto.dev/docs/instrumentation/track-events#performance> > >>>>> per > >>>>>> event. While enabled, the overhead can be around 1 us per event. > >>>>>> > >>>>>> *Integration Challenges* > >>>>>> > >>>>>> - The perfetto SDK is C++ and designed around macros, lambdas, > >>>>>> inline > >>>>>> templates, etc. There are ongoing discussions on providing an > >>>>>> official > >>>>>> perfetto C API, but it is not yet clear when this will land on the > >>>>> perfetto > >>>>>> roadmap. > >>>>>> - The perfetto SDK is an amalgamated .h and .cc that adds up to > >>>>>> 100K > >>>>>> lines of code. > >>>>>> - Anything that includes perfetto.h takes a long time to compile. > >>>>>> - The current Perfetto SDK design is incompatible with being a > >>>>>> shared > >>>>>> library behind a C API. > >>>>>> > >>>>>> *Percetto* > >>>>>> > >>>>>> The percetto library <https://github.com/olvaffe/percetto> was recently > >>>>>> implemented to provide an interim C API for perfetto. It provides > >>>>> efficient > >>>>>> support for scoped trace events, multiple categories, counters, custom > >>>>>> timestamps, and debug data annotations. Percetto also provides some > >>>>>> features that are important to mesa, but not available yet with > >>>>>> perfetto > >>>>>> SDK: > >>>>>> > >>>>>> - Trace events from multiple perfetto instances in separate shared > >>>>>> libraries (like mesa and virglrenderer) show correctly in a single > >>>>> process > >>>>>> and thread view. > >>>>>> - Counter tracks and macro API. > >>>>>> > >>>>>> Percetto is missing API for perfetto's GPU DataSource and counter > >>>>> support, > >>>>>> but that feature could be implemented next if it is important for mesa. > >>>>>> With the existing percetto API mesa could present GPU trace data as > >>>>>> named > >>>>>> 'slice' events and int64_t counters with custom timestamps as shown in > >>>>> the > >>>>>> image above (based on this sample > >>>>>> <https://github.com/olvaffe/percetto/blob/main/examples/timestamps.c>). > >>>>>> > >>>>>> *Mesa Integration Alternatives* > >>>>>> > >>>>>> Note: we have some pressing needs for performance analysis in Chrome > >>>>>> OS, > >>>>> so > >>>>>> I'm intentionally leaving out the alternative of waiting for an > >>>>>> official > >>>>>> perfetto C API. Of course, once that C API is available it would become > >>>>> an > >>>>>> option to migrate to it from any of the alternatives below. > >>>>>> > >>>>>> Ordered by difficulty with easiest first: > >>>>>> > >>>>>> 1. Statically link with percetto as an optional external > >>>>>> dependency > >>>>>> (virglrenderer > >>>>>> now has this approach > >>>>>> < > >>>>> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/480> > >>>>>> ). > >>>>>> - Pros: API already supports most common tracing needs. Tested and > >>>>> used > >>>>>> by an increasing number of CrOS components. > >>>>>> - Cons: External dependency for optional mesa build option. > >>>>>> 2. Embed Perfetto SDK + a Percetto fork/copy. > >>>>>> - Pros: API already supports most common tracing needs. No > >>>>>> added > >>>>>> external dependency for mesa. > >>>>>> - Cons: Percetto code divergence, bug fixes need to land in two > >>>>> trees. > >>>>>> 3. Embed Perfetto SDK + custom C wrapper. > >>>>>> - Pros: Tailored API for mesa's needs. > >>>>>> - Cons: Nontrivial development efforts and maintenance. > >>>>>> 4. Generate C stubs for the Perfetto protobuf and reimplement the > >>>>>> Perfetto SDK in C. > >>>>>> - Pros: Tailored API for mesa's needs. Possible smaller binary > >>>>> impact > >>>>>> from simpler implementation. > >>>>>> - Cons: Significant development efforts and maintenance. > >>>>>> > >>>>>> Regardless of the integration direction, I expect we would disable > >>>>> perfetto > >>>>>> in the default build for now to minimize disruption. > >>>>>> > >>>>>> I like #1, because there are some nontrivial subtleties to the C > >>>>>> wrapper > >>>>>> that provide both API conveniences and runtime performance that would > >>>>> need > >>>>>> to be reimplemented or maintained with the other options. I will also > >>>>>> volunteer to do #1 or #2, but I'm not sure I have time for #3 or #4 :D. > >>>>>> > >>>>>> Any other thoughts on how best to integrate perfetto into mesa? > >>>>>> > >>>>>> -jb > >>>>>> _______________________________________________ > >>>>>> mesa-dev mailing list > >>>>>> mesa-dev@lists.freedesktop.org > >>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >>>> _______________________________________________ > >>>> mesa-dev mailing list > >>>> mesa-dev@lists.freedesktop.org > >>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >>> _______________________________________________ > >>> mesa-dev mailing list > >>> mesa-dev@lists.freedesktop.org > >>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > >> > >> _______________________________________________ > >> mesa-dev mailing list > >> mesa-dev@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev