Hi Vadim,
from your description it seems to be a post processing stage working on
the bytecode of the shaders and additional to that is quite separated
from the rest of the driver.
If that's the case then I don't really see a reason why we shouldn't
merge it, but at least at the beginning it should probably be disabled
by default.
On the other hand we should question if there are any optimizations in
there that could be done on earlier stages, something like on the GLSL
level for example?
Cheers,
Christian.
Am 19.04.2013 16:48, schrieb Vadim Girlin:
Hi,
In the previous status update I said that the r600-sb branch is not
ready to be merged yet, but recently I've done some cleanups and
reworks, and though I haven't finished everything that I planned
initially, I think now it's in a better state and may be considered
for merging.
I'm interested to know if the people think that merging of the r600-sb
branch makes sense at all. I'll try to explain here why it makes sense
to me.
Although I understand that the development of llvm backend is a
primary goal for the r600g developers, it's a complicated process and
may require quite some time to achieve good results regarding the
shader/compiler performance, and at the same time this branch already
works and provides good results in many cases. That's why I think it
makes sense to merge this branch as a non-default backend at least as
a temporary solution for shader performance problems. We can always
get rid of it if it becomes too much a maintenance burden or when llvm
backend catches up in terms of shader performance and compilation
speed/overhead.
Regarding the support and maintenance of this code, I'll try to do my
best to fix possible issues, and so far there are no known unfixed
issues. I tested it with many apps on evergreen and fixed all issues
with other chips that were reported to me on the list or privately
after the last status announce. There are no piglit regressions on
evergreen when this branch is used with both default and llvm backends.
This code was intentionally separated as much as possible from the
other parts of the driver, basically there are just two functions used
from r600g, and the shader code is passed to/from r600-sb as a
hardware bytecode that is not going to change. I think it won't
require any modifications at all to keep it in sync with the most
changes in r600g.
Some work might be required though if we'll want to add support for
the new hw features that are currently unused, e.g. geometry shaders,
new instruction types for compute shaders, etc, but I think I'll be
able to catch up when it's implemented in the driver and default or
llvm backend. E.g. this branch already works for me on evergreen with
some simple OpenCL kernels, including bfgminer where it increases
performance of the kernel compiled with llvm backend by more than 20%
for me.
Besides the performance benefits, I think that alternative backend
also might help with debugging of the default or llvm backend, in some
cases it helped me by exposing the bugs that are not very obvious
otherwise, e.g. it may be hard to compare the dumps from default and
llvm backend to spot the regression because they are too different,
but after processing both shaders with r600-sb the code is usually
transformed to some more common form, and often this makes it easier
to compare and find the differences in shader logic.
One additional feature that might help with llvm backend debugging is
the disassembler that works on the hardware bytecode instead of the
internal r600g bytecode structs. This results in the more readable
shader dumps for instructions passed in native hw encoding from llvm
backend. I think this also can help to catch more potential bugs
related to bytecode building in r600g/llvm. Currently r600-sb uses its
bytecode disassembler for all shader dumps, including the fetch
shaders, even when optimization is not enabled. Basically it can
replace r600_bytecode_disasm and related code completely.
Below are some quick benchmarks for shader performance and compilation
time, to demonstrate that currently r600-sb might provide better
performance for users, at least in some cases.
As an example of the shaders with good optimization opportunities I
used the application that computes and renders atmospheric scattering
effects, it was mentioned in the previous thread:
http://lists.freedesktop.org/archives/mesa-dev/2013-February/034682.html
Here are current results for that app (Main.noprecompute, frames per
second) with default backend, default backend + r600-sb, and llvm
backend:
def def+sb llvm
240 590 248
Another quick benchmark is an OpenCL kernel performance with bfgminer
(megahash/s):
llvm llvm+sb
68 87
One more benchmark is for compilation speed/overhead - I used two
piglit tests, first compiles a lot of shaders (IIRC more than
thousand), second compiles a few huge shaders. Result is a test run
time in seconds, this includes not only the compilation time but
anyway shows the difference:
def def+sb llvm
tfb max-varyings 10 14 53
fp-long-alu 0.17 0.38 0.68
This is especially important for GL apps, because longer compilation
time results in the more significant freezes in the games etc. As for
the quality of the compiled code in this test, of course generally
llvm backend is already able to produce better code in some cases, but
e.g. for the longest shader from the fp-long-alu test both backends
optimize it to the two alu instructions.
Of course this branch won't magically make all applications faster,
many older apps are not really limited by the shader performance at
all, but I think it might improve performance for many relatively
modern applications/engines, e.g. for the applications based on the
Unigine and Source engines.
The branch itself can be found here:
http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
You might prefer to browse new files in a tree instead of reading a
huge patch:
http://cgit.freedesktop.org/~vadimg/mesa/tree/src/gallium/drivers/r600/sb?h=r600-sb
If you'd like to test it, currently the optimization for GL shaders is
enabled by default, can be disabled with R600_SB=0. Optimization for
compute shaders is not enabled by default because it's still very
limited and experimental, can be enabled with R600_SB_CL=1.
Disassemble of the optimized shaders is printed with R600_DUMP_SHADERS=2.
If you think that merging of the branch makes sense, any
comments/suggestions about what is required to prepare the branch for
merging are welcome.
Vadim
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev