Hi,

On 18.10.2016 20:12, Jan Ziak wrote:
[...]
Never profile with -O0 or disabled function inlining.

Seriously?

Nobody's going to take seriously optimization results taken from non-optimized builds.


Mesa uses -g -O2
with --enable-debug, so that's what you should use too. Don't use any
other -O* variants.

What if I find a case where -O2 prevents me from easily seeing
information necessary to optimize the source code?

I've never had that as a problem after the ARM call unwinding was solved in GCC and profiling tools decade ago (Valgrind even has some patches for that from the work we were doing at my previous employer).

You need to know what compiler optimizations do, to better understand the shown data, but tools can nowadays e.g. show inlined code correctly. In general, if you have problems with optimized builds, either your tools or your builds are broken.

(C++ does make things a bit more difficult because there's *much* more inlining happening with compiler optimizations on C++ code.


(Rest of the mail is general comments on profiling, not so much aimed for you or Marek, I assume you both already know that stuff.)

The only profiling tools reporting correct results are perf and
sysprof.

Perf uses sampling and reports averages. While perf varies the sampling rate, sampling can still misrepresent some things (small frequently called things), and averages aren't good for everything.

That's why one should *also* use something like Valgrind which doesn't miss things (although it cannot accurately estimate how much time they take), so that you can see all call chains & call counts.

This isn't about latency, but for that good Intel PT based tool would be most correct. Like the data provided by ARM ETM interface, it's very awkward to use though (GBs of data to process, tools not open source etc).


I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
used callgrind to see some more code.

(both use the same mechanism) If you don't enable dwarf in
perf (also sysprof can't use dwarf), you have to build Mesa with
-fno-omit-frame-pointer to see call trees. The only reason you would
want to enable dwarf-based call trees is when you want to see libc
calls. Otherwise, they won't be displayed or counted as part of call
trees. For Mesa developers who do profiling often,
-fno-omit-frame-pointer should be your default.

Callgrind counts calls (that one you can trust), but the reported time
is incorrect,

Callgrind reports number of instructions, not time.

Cachegrind can provide estimates for how much time is taken, but as you mentioned, it's not very reliable (while one can specify similar cache sizes as the target machine has, the cache model is inaccurate, and I don't think it counts SIMD code correctly).

Both report this data only for the user-space process, not for the work the process requests from the kernel.

[...]

        - Eero

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to