Re: [Mesa-dev] r600g: status of my work on the shader optimization

Vadim Girlin Fri, 15 Feb 2013 18:31:43 -0800

On 02/15/2013 03:22 PM, Christian König wrote:

Am 15.02.2013 12:00, schrieb Vadim Girlin:

On 02/14/2013 02:42 PM, Christian König wrote:

Hi Vadim,


nice work, I think you've made quite a progress here, but on the other
hand it should be clear that the LLVM backend is the future and we
should concentrate on that.


"LLVM backend is the future" is a pretty abstract argument. I prefer
to operate with real facts. After a year of LLVM backend development
what are the real benefits for the users? What are the real use cases
where the users might prefer LLVM backend? To me this situation looks
like the use of LLVM requires a lot more time and development efforts
than the custom solution, despite the initial expectations. Maybe you
are right and the LLVM backend will become the best alternative for
users sometime in the future, but I only have some today's results:

Heaven 3.0, all settings high/enabled, 1280x720, HD5750:
  default backend : 20.0 fps
  llvm backend    : 18.8 fps
  r600-sb         : 38.0 fps


Quite impressive. What's actually doing better than the LLVM backend?

I've tried to disable some passes/features to see how much it affectsthe performance with Heaven.


With everything enabled, the result is 38.0-38.3.

Without fetch instructions grouping - 32.9
Without if-conversion - 34.4
Without GVN - 37.5

Use of temporary registers seems to have no noticeable effect.

Without fetch grouping, if-conversion, GVN, temp GPRs - 29.3

The remaining passes are required and can't be disabled - basically it'sdead code elimination, global scheduler (GCM), regalloc, alu scheduler.

I hope this information is useful. Due to the lack ofperformance-related info about the hardware, the source of someimprovements is not obvious even to me, and some results from above arenot exactly what I expected.

Also I did only one run for each case above, so probably there are somestatistical errors, and it's better to check everything more thoroughlybefore relying on these results.


Vadim



When I'm looking at these results, the benefits of LLVM-based solution
are not very clear to me.

I'm not trying to persuade anyone, just wanted to explain why I
decided to switch back to work on the non-LLVM solution.

Anyway, it's absolutely not a problem for me if this branch will never
make it to mesa, I was ready to this before I started. One of the
goals of this branch was just to show that the use of LLVM is possibly
not the the best way of the GL shaders compilation for r600g. And
another goal, of course, is to get better performance with r600g
*today*, not in the future.


Yeah, that's why I wrote I'm not sure what to do with it. On one hand
it's a quite nice improvement that's already working and somewhat
stable, one the other hand if we merge it we also need to support it. I
suggest that you try to stabilize it a bit more first and then we see.

Christian.


Vadim


To sum it up I'm not sure what we should do with this branch :)

As Dragomir already wrote even if the code won't be used much the
know-how you gained while coding it will stay, believe me that this is
or far more value than the code itself.

Christian.

Am 14.02.2013 11:10, schrieb Dragomir Ivanov:

Greetings,
I hope that, even if you work will be short-lived, e.g. until LLVM
bytecode compiler takes off, the know-how is still very useful.


On Thu, Feb 14, 2013 at 4:04 AM, Vadim Girlin <[email protected]
<mailto:[email protected]>> wrote:

    Hi,

    Last month I finally found the time to work on the rewrite of my
    previous shader optimization branch, now it's mostly done in terms
    of the correctness of produced code and feature support (at least
    on evergreen), though it's still a work in progress in terms of
    the efficiency of generated shader code and the efficiency of the
    backend itself.

    I spent some time last year studying the LLVM infrastructure and
    R600 LLVM backend and trying to improve it, but after all I came
    to the conclusion that for me it might be easier to implement all
    that I wanted in the custom backend. This allows for more simple
    and efficient implementation - e.g. I don't have to deal with CFGs
    because in fact we have structured code, so it's possible to use
    more simple and efficient algorithms.

    Currently the branch has no regressions with piglit's
    quick-driver.tests on evergreen (it doesn't rely on the fallback
    to unoptimized code for the shaders with relative addressing and
    other cases unlike the previous branch), and so far I don't see
    any rendering issues with the apps that I used for testing -
     Lightsmark 2008, Unigine Heaven 3.0 and some others.. There are
    also some performance improvements with the gpu-bound apps.

    I tried to keep in mind the differences between chip classes, so I
    hope it should only require minor fixes to make it work on
    non-evergreen chips, but I doubt that it will work out of the box
    - support for some non-evergreen hw-specific features is still
    missing, e.g. I'm sure that indirect addressing currently won't
    work on R6xx, though basic tests might work in theory. Fixing this
    shouldn't require a lot of work though.

    The branch can be found in my freedesktop repo:

    http://cgit.freedesktop.org/~vadimg/mesa/log/?h=r600-sb
<http://cgit.freedesktop.org/%7Evadimg/mesa/log/?h=r600-sb>

    Regarding the differences from the previous branch - there are
    some additional optimizations, e.g. global value numbering with
    some basic support for constant folding (not all instructions are
    currently handled, but it's easy to extend), global code motion
    that can hoist invariant code out of the loops etc. Some
    optimizations that were implemented in the previous branch are not
    implemented in the new branch (yet), e.g. propagation of modifiers
    (I'm not even sure if it has any noticeable effect on performance).

    Unlike the previous branch, there is support for indirect
    addressing on registers -  currently it uses my previously posted
    patch (that was not very welcome) for obtaining the information
    about addressable register ranges, but it's not required and can
    be dropped, I just used that patch for testing. Without that
    information opportunities for optimization are limited though, and
    perhaps it makes sense to not try to optimize the shaders with
    indirect gpr addressing at all and rely on the old backend until
    we'll have the proper solution to pass that information to the
    drivers.

    There is also initial support for ALU predication, but it's not
    complete and currently unused, I'm not sure if predication support
    will have significant effect on performance that will justify more
    complex and expensive algorithms for register allocator and
    scheduler, probably I'll look into it later, I consider this as a
    low priority. In the case of predicated source code (from LLVM
    backend) the predication is eliminated using speculative execution
    and conditional moves, same as with the simple if-conversion pass
    that is also implemented.

    The branch currently uses as source the bytecode built by the old
    backend (that may also come from LLVM backend) and some additional
    information (about inputs etc), final bytecode is built by the new
    builder in the branch. Building two versions of the bytecode
    doesn't look very efficient, but currently it simplifies
    debugging. I'm planning to implement translation from TGSI
    directly to my representation, it should simplify the translator
    and allow to get rid of unnecessary intermediate passes.

    Some old and new environment variables can be used to control the
    behavior of this backend:

    R600_SB - 0 - disable new backend completely, 1 - enable (default)
    R600_SB_USE_NEW_BYTECODE - 0 - disable use of the produced
    bytecode (useful if you only want to look at the dump of the
    optimized shader without passing it to hw), 1 - enable (default)
    R600_DUMP_SHADERS - will also dump the dissasemble of the
    optimized shader after original bytecode (if backend is not
    disabled with R600_SB=0).

    Produced shader code is not ideal - e.g. you may notice not very
    necessary MOVs inserted before DOT4 instructions, it's a known
    issue and I'm going to look into it - this may require rework of
    the regalloc/scheduler. I had to sacrifice some features to make
    it work correctly with Heaven first, so that now I can try to
    improve it while being able to test for regressions.

    Also probably there are some issues with the cleanness of the code
    - I had to rework some parts a few times while fixing all
    problems, so there is possibly unused code and other remnants of
    the previous versions. Anyway, I still consider it as a work in
    progress and some things are going to be reworked.

    I'm not sure what will be the destiny of this branch, taking into
    account that we also have actively developed LLVM backend that is
    required for OpenCL anyway. Your opinions are welcome.

    Vadim
    _______________________________________________
    mesa-dev mailing list
    [email protected]
<mailto:[email protected]>
    http://lists.freedesktop.org/mailman/listinfo/mesa-dev




_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] r600g: status of my work on the shader optimization

Reply via email to