Hi,
On 18.10.2016 20:12, Jan Ziak wrote:
[...]
Never profile with -O0 or disabled function inlining.
Seriously?
Nobody's going to take seriously optimization results taken from
non-optimized builds.
Mesa uses -g -O2
with --enable-debug, so that's what you should use too. Don't use any
other -O* variants.
What if I find a case where -O2 prevents me from easily seeing
information necessary to optimize the source code?
I've never had that as a problem after the ARM call unwinding was solved
in GCC and profiling tools decade ago (Valgrind even has some patches
for that from the work we were doing at my previous employer).
You need to know what compiler optimizations do, to better understand
the shown data, but tools can nowadays e.g. show inlined code correctly.
In general, if you have problems with optimized builds, either your
tools or your builds are broken.
(C++ does make things a bit more difficult because there's *much* more
inlining happening with compiler optimizations on C++ code.
(Rest of the mail is general comments on profiling, not so much aimed
for you or Marek, I assume you both already know that stuff.)
The only profiling tools reporting correct results are perf and
sysprof.
Perf uses sampling and reports averages. While perf varies the sampling
rate, sampling can still misrepresent some things (small frequently
called things), and averages aren't good for everything.
That's why one should *also* use something like Valgrind which doesn't
miss things (although it cannot accurately estimate how much time they
take), so that you can see all call chains & call counts.
This isn't about latency, but for that good Intel PT based tool would be
most correct. Like the data provided by ARM ETM interface, it's very
awkward to use though (GBs of data to process, tools not open source etc).
I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
used callgrind to see some more code.
(both use the same mechanism) If you don't enable dwarf in
perf (also sysprof can't use dwarf), you have to build Mesa with
-fno-omit-frame-pointer to see call trees. The only reason you would
want to enable dwarf-based call trees is when you want to see libc
calls. Otherwise, they won't be displayed or counted as part of call
trees. For Mesa developers who do profiling often,
-fno-omit-frame-pointer should be your default.
Callgrind counts calls (that one you can trust), but the reported time
is incorrect,
Callgrind reports number of instructions, not time.
Cachegrind can provide estimates for how much time is taken, but as you
mentioned, it's not very reliable (while one can specify similar cache
sizes as the target machine has, the cache model is inaccurate, and I
don't think it counts SIMD code correctly).
Both report this data only for the user-space process, not for the work
the process requests from the kernel.
[...]
- Eero
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev