Greetings,
I hope that, even if you work will be short-lived, e.g. until LLVM
bytecode compiler takes off, the know-how is still very useful.
On Thu, Feb 14, 2013 at 4:04 AM, Vadim Girlin <vadimgir...@gmail.com
<mailto:vadimgir...@gmail.com>> wrote:
Hi,
Last month I finally found the time to work on the rewrite of my
previous shader optimization branch, now it's mostly done in terms
of the correctness of produced code and feature support (at least
on evergreen), though it's still a work in progress in terms of
the efficiency of generated shader code and the efficiency of the
backend itself.
I spent some time last year studying the LLVM infrastructure and
R600 LLVM backend and trying to improve it, but after all I came
to the conclusion that for me it might be easier to implement all
that I wanted in the custom backend. This allows for more simple
and efficient implementation - e.g. I don't have to deal with CFGs
because in fact we have structured code, so it's possible to use
more simple and efficient algorithms.
Currently the branch has no regressions with piglit's
quick-driver.tests on evergreen (it doesn't rely on the fallback
to unoptimized code for the shaders with relative addressing and
other cases unlike the previous branch), and so far I don't see
any rendering issues with the apps that I used for testing -
Lightsmark 2008, Unigine Heaven 3.0 and some others.. There are
also some performance improvements with the gpu-bound apps.
I tried to keep in mind the differences between chip classes, so I
hope it should only require minor fixes to make it work on
non-evergreen chips, but I doubt that it will work out of the box
- support for some non-evergreen hw-specific features is still
missing, e.g. I'm sure that indirect addressing currently won't
work on R6xx, though basic tests might work in theory. Fixing this
shouldn't require a lot of work though.
The branch can be found in my freedesktop repo:
http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
<http://cgit.freedesktop.org/%7Evadimg/mesa/log/?h=r600-sb>
Regarding the differences from the previous branch - there are
some additional optimizations, e.g. global value numbering with
some basic support for constant folding (not all instructions are
currently handled, but it's easy to extend), global code motion
that can hoist invariant code out of the loops etc. Some
optimizations that were implemented in the previous branch are not
implemented in the new branch (yet), e.g. propagation of modifiers
(I'm not even sure if it has any noticeable effect on performance).
Unlike the previous branch, there is support for indirect
addressing on registers - currently it uses my previously posted
patch (that was not very welcome) for obtaining the information
about addressable register ranges, but it's not required and can
be dropped, I just used that patch for testing. Without that
information opportunities for optimization are limited though, and
perhaps it makes sense to not try to optimize the shaders with
indirect gpr addressing at all and rely on the old backend until
we'll have the proper solution to pass that information to the
drivers.
There is also initial support for ALU predication, but it's not
complete and currently unused, I'm not sure if predication support
will have significant effect on performance that will justify more
complex and expensive algorithms for register allocator and
scheduler, probably I'll look into it later, I consider this as a
low priority. In the case of predicated source code (from LLVM
backend) the predication is eliminated using speculative execution
and conditional moves, same as with the simple if-conversion pass
that is also implemented.
The branch currently uses as source the bytecode built by the old
backend (that may also come from LLVM backend) and some additional
information (about inputs etc), final bytecode is built by the new
builder in the branch. Building two versions of the bytecode
doesn't look very efficient, but currently it simplifies
debugging. I'm planning to implement translation from TGSI
directly to my representation, it should simplify the translator
and allow to get rid of unnecessary intermediate passes.
Some old and new environment variables can be used to control the
behavior of this backend:
R600_SB - 0 - disable new backend completely, 1 - enable (default)
R600_SB_USE_NEW_BYTECODE - 0 - disable use of the produced
bytecode (useful if you only want to look at the dump of the
optimized shader without passing it to hw), 1 - enable (default)
R600_DUMP_SHADERS - will also dump the dissasemble of the
optimized shader after original bytecode (if backend is not
disabled with R600_SB=0).
Produced shader code is not ideal - e.g. you may notice not very
necessary MOVs inserted before DOT4 instructions, it's a known
issue and I'm going to look into it - this may require rework of
the regalloc/scheduler. I had to sacrifice some features to make
it work correctly with Heaven first, so that now I can try to
improve it while being able to test for regressions.
Also probably there are some issues with the cleanness of the code
- I had to rework some parts a few times while fixing all
problems, so there is possibly unused code and other remnants of
the previous versions. Anyway, I still consider it as a work in
progress and some things are going to be reworked.
I'm not sure what will be the destiny of this branch, taking into
account that we also have actively developed LLVM backend that is
required for OpenCL anyway. Your opinions are welcome.
Vadim
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
<mailto:mesa-dev@lists.freedesktop.org>
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev