Re: [Mesa-dev] r600g: status of the r600-sb branch

Vadim Girlin Sun, 21 Apr 2013 11:29:13 -0700

On 04/21/2013 04:04 AM, Marek Olšák wrote:

Ah, I didn't know you had any other env vars. It's preferable to have
as many boolean flags as possible handled by a single env var, because
it's easier to use (R600_DUMP_SHADERS counts as a pretty ugly list of
boolean flags hidden behind a magic number). Feel free to have
separate env vars for more complex parameters.


I skimmed through some of your code and the coding style looks good.
I'm also okay with C++, it really seems like the right choice here.
However I agree with the argument that one header file per cpp might
not always be a good idea, especially if the header file is pretty
small.

Thanks for reviewing. I pushed to my repo the branch with the followingchanges:


- changes to existing r600g code splitted from the main big patch

- small header files merged into sb_pass.h, sb_ir.h, sb_bc.h

- added new R600_DEBUG flags to replace multiple env vars:
    sb - Enable optimization of graphics shaders
    sbcl - Enable optimization of compute shaders
    sbdry - Dry run, optimize but don't use new bytecode
    sbstat - Print optimization statistics (currently the time only)
    sbdump - Print IR after some passes.

- added debug_id (shader index) to struct r600_bytecode, id's areassigned to each shader in r600_bytecode_init and printed in the shaderdump header, it's intended to avoid reinventing shader numbering indifferent places for dumps and debugging.


- some minor cleanups

Updated branch can be found here:

  http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb-2

Vadim

Marek

On Sat, Apr 20, 2013 at 11:02 AM, Vadim Girlin <vadimgir...@gmail.com> wrote:

On 04/20/2013 03:11 AM, Marek Olšák wrote:


Please don't add any new environment variables and use R600_DEBUG
instead. The other environment variables are deprecated.



I agree, those vars probably need some cleanup, they were added before
R600_DEBUG appeared.

Though I'm afraid some of my options won't fit well into the R600_DEBUG
flags, unless we'll add support for the name/value pairs with optional
custom parsers.

E.g. I have a group of env vars to define the range of included/excluded
shaders for optimization and mode (include/exclude/off), I thought about
doing this with a single var and custom parser to specify the range e.g. as
"10-20", but after all it's just a debug feature, not intended for everyday
use, and so far I failed to convince myself that it's worth the efforts.

I can implement the support for custom parsers for R600_DEBUG, but do we
really need it? Maybe it would be enough to add e.g."sb" instead of R600_SB
var to the R600_DEBUG flags for enabling it (probably together with other
boolean options such as R600_SB_USE_NEW_BYTECODE) but leave more complicated
internal debug options as is?

Vadim

There is a table for R600_DEBUG in r600_pipe.c and it even comes with
a help feature: R600_DEBUG=help

Marek

On Fri, Apr 19, 2013 at 4:48 PM, Vadim Girlin <vadimgir...@gmail.com>
wrote:


Hi,

In the previous status update I said that the r600-sb branch is not ready
to
be merged yet, but recently I've done some cleanups and reworks, and
though
I haven't finished everything that I planned initially, I think now it's
in
a better state and may be considered for merging.

I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense to
me.

Although I understand that the development of llvm backend is a primary
goal
for the r600g developers, it's a complicated process and may require
quite
some time to achieve good results regarding the shader/compiler
performance,
and at the same time this branch already works and provides good results
in
many cases. That's why I think it makes sense to merge this branch as a
non-default backend at least as a temporary solution for shader
performance
problems. We can always get rid of it if it becomes too much a
maintenance
burden or when llvm backend catches up in terms of shader performance and
compilation speed/overhead.

Regarding the support and maintenance of this code, I'll try to do my
best
to fix possible issues, and so far there are no known unfixed issues. I
tested it with many apps on evergreen and fixed all issues with other
chips
that were reported to me on the list or privately after the last status
announce. There are no piglit regressions on evergreen when this branch
is
used with both default and llvm backends.

This code was intentionally separated as much as possible from the other
parts of the driver, basically there are just two functions used from
r600g,
and the shader code is passed to/from r600-sb as a hardware bytecode that
is
not going to change. I think it won't require any modifications at all to
keep it in sync with the most changes in r600g.

Some work might be required though if we'll want to add support for the
new
hw features that are currently unused, e.g. geometry shaders, new
instruction types for compute shaders, etc, but I think I'll be able to
catch up when it's implemented in the driver and default or llvm backend.
E.g. this branch already works for me on evergreen with some simple
OpenCL
kernels, including bfgminer where it increases performance of the kernel
compiled with llvm backend by more than 20% for me.

Besides the performance benefits, I think that alternative backend also
might help with debugging of the default or llvm backend, in some cases
it
helped me by exposing the bugs that are not very obvious otherwise, e.g.
it
may be hard to compare the dumps from default and llvm backend to spot
the
regression because they are too different, but after processing both
shaders
with r600-sb the code is usually transformed to some more common form,
and
often this makes it easier to compare and find the differences in shader
logic.

One additional feature that might help with llvm backend debugging is the
disassembler that works on the hardware bytecode instead of the internal
r600g bytecode structs. This results in the more readable shader dumps
for
instructions passed in native hw encoding from llvm backend. I think this
also can help to catch more potential bugs related to bytecode building
in
r600g/llvm. Currently r600-sb uses its bytecode disassembler for all
shader
dumps, including the fetch shaders, even when optimization is not
enabled.
Basically it can replace r600_bytecode_disasm and related code
completely.

Below are some quick benchmarks for shader performance and compilation
time,
to demonstrate that currently r600-sb might provide better performance
for
users, at least in some cases.

As an example of the shaders with good optimization opportunities I used
the
application that computes and renders atmospheric scattering effects, it
was
mentioned in the previous thread:
http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html

Here are current results for that app (Main.noprecompute, frames per
second)
with default backend, default backend + r600-sb, and llvm backend:
          def     def+sb  llvm
          240     590     248

Another quick benchmark is an OpenCL kernel performance with bfgminer
(megahash/s):
          llvm    llvm+sb
          68      87

One more benchmark is for compilation speed/overhead - I used two piglit
tests, first compiles a lot of shaders (IIRC more than thousand), second
compiles a few huge shaders. Result is a test run time in seconds, this
includes not only the compilation time but anyway shows the difference:
                          def     def+sb  llvm
tfb max-varyings        10      14      53
fp-long-alu             0.17    0.38    0.68

This is especially important for GL apps, because longer compilation time
results in the more significant freezes in the games etc. As for the
quality
of the compiled code in this test, of course generally llvm backend is
already able to produce better code in some cases, but e.g. for the
longest
shader from the fp-long-alu test both backends optimize it to the two alu
instructions.

Of course this branch won't magically make all applications faster, many
older apps are not really limited by the shader performance at all, but I
think it might improve performance for many relatively modern
applications/engines, e.g. for the applications based on the Unigine and
Source engines.

The branch itself can be found here:

http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb

You might prefer to browse new files in a tree instead of reading a huge
patch:


http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb?h=r600-sb

If you'd like to test it, currently the optimization for GL shaders is
enabled by default, can be disabled with R600_SB=0. Optimization for
compute
shaders is not enabled by default because it's still very limited and
experimental, can be enabled with R600_SB_CL=1. Disassemble of the
optimized
shaders is printed with R600_DUMP_SHADERS=2.

If you think that merging of the branch makes sense, any
comments/suggestions about what is required to prepare the branch for
merging are welcome.

Vadim
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g: status of the r600-sb branch

Reply via email to