Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

Eero Tamminen Wed, 19 Oct 2016 07:01:22 -0700

Hi,

On 18.10.2016 20:12, Jan Ziak wrote:
[...]

Never profile with -O0 or disabled function inlining.


Seriously?

Nobody's going to take seriously optimization results taken fromnon-optimized builds.

Mesa uses -g -O2
with --enable-debug, so that's what you should use too. Don't use any
other -O* variants.


What if I find a case where -O2 prevents me from easily seeing
information necessary to optimize the source code?

I've never had that as a problem after the ARM call unwinding was solvedin GCC and profiling tools decade ago (Valgrind even has some patchesfor that from the work we were doing at my previous employer).

You need to know what compiler optimizations do, to better understandthe shown data, but tools can nowadays e.g. show inlined code correctly.In general, if you have problems with optimized builds, either yourtools or your builds are broken.

(C++ does make things a bit more difficult because there's *much* moreinlining happening with compiler optimizations on C++ code.

(Rest of the mail is general comments on profiling, not so much aimedfor you or Marek, I assume you both already know that stuff.)

The only profiling tools reporting correct results are perf and
sysprof.

Perf uses sampling and reports averages. While perf varies the samplingrate, sampling can still misrepresent some things (small frequentlycalled things), and averages aren't good for everything.

That's why one should *also* use something like Valgrind which doesn'tmiss things (although it cannot accurately estimate how much time theytake), so that you can see all call chains & call counts.

This isn't about latency, but for that good Intel PT based tool would bemost correct. Like the data provided by ARM ETM interface, it's veryawkward to use though (GBs of data to process, tools not open source etc).

I used perf on Metro 2033 Redux and saw do_dead_code() there. Then I
used callgrind to see some more code.

(both use the same mechanism) If you don't enable dwarf in
perf (also sysprof can't use dwarf), you have to build Mesa with
-fno-omit-frame-pointer to see call trees. The only reason you would
want to enable dwarf-based call trees is when you want to see libc
calls. Otherwise, they won't be displayed or counted as part of call
trees. For Mesa developers who do profiling often,
-fno-omit-frame-pointer should be your default.

Callgrind counts calls (that one you can trust), but the reported time
is incorrect,


Callgrind reports number of instructions, not time.

Cachegrind can provide estimates for how much time is taken, but as youmentioned, it's not very reliable (while one can specify similar cachesizes as the target machine has, the cache model is inaccurate, and Idon't think it counts SIMD code correctly).

Both report this data only for the user-space process, not for the workthe process requests from the kernel.


[...]

        - Eero

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH] glsl: optimize list handling in opt_dead_code

Reply via email to