Re: [Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Jose Fonseca Tue, 19 Aug 2014 07:24:28 -0700

On 18/08/14 17:25, Connor Abbott wrote:

On Mon, Aug 18, 2014 at 11:47 AM, Jose Fonseca <jfons...@vmware.com> wrote:

On 18/08/14 14:21, Marek Olšák wrote:


On Mon, Aug 18, 2014 at 2:44 PM, Roland Scheidegger <srol...@vmware.com>
wrote:


Am 16.08.2014 02:12, schrieb Connor Abbott:

I know what you might be thinking right now. "Wait, *another* IR? Don't
we already have like 5 of those, not counting all the driver-specific
ones? Isn't this stuff complicated enough already?" Well, there are some
pretty good reasons to start afresh (again...). In the years we've been
using GLSL IR, we've come to realize that, in fact, it's not what we
want *at all* to do optimizations on. Ian has done a talk at FOSDEM that
highlights some of the problems they've run into:

https://urldefense.proofpoint.com/v1/url?u=https://video.fosdem.org/2014/H1301_Cornil/Saturday/Three_Years_Experience_with_a_Treelike_Shader_IR.webm&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=F4msKE2WxRzA%2BwN%2B25muztFm5TSPwE8HKJfWfR2NgfY%3D%0A&m=iXhCeAYmidPDc1lFo757Cc9V0PvWAN4n3X%2Fw%2B%2F7Lx%2Fs%3D%0A&s=f103fb26bf53eee64318a490517d1ee9ab88ecd29fcdbe49d54b5a27e7581c2e

But here's the summary:

* GLSL IR is way too much of a memory hog, since it has to make a new
variable for each temporary the compiler creates and then each time you
want to dereference that temporary you need to create an
ir_dereference_variable that points to it which is also very
cache-unfriendly ("downright cache-mean!").

* The expression trees were originally added so that we could do
pattern matching to automatically optimize things, but this turned out
to be both very difficult to do and not very helpful. Instead, all it
does is add more complexity to the IR without much benefit - with SSA or
having proper use-def chains, we could get back what the trees give us
while also being able to do lots more optimizations.

* We don't have the concept of basic blocks in GLSL IR, which makes a
lot of optimizations harder because they were originally designed with
basic blocks in mind - take, for example, my SSA series. I had to map a
whole lot of concepts that were based on the control flow graph to this
tree of statements that GLSL IR uses, and the end result wound up
looking nothing at all like the original paper. This problem gets even
worse for things like e.g. Global Code Motion that depend upon having
the dominance tree.

I originally wanted to modify GLSL IR to fix these problems by adding
new instruction types that would address these issues and then
converting back and forth between the old and the new form, but I
realized that fixing all the problems would basically mean a complete
rewrite - and if that's the case, then why don't we start from scratch?
So I took Ken's suggestions and started designing, and then at Intel
over the summer started implementing, a completely new IR which I call
NIR that's at a lower level than GLSL IR, but still high-level enough to
be mostly device-independant (different drivers may have different
passes and different ways of lowering e.g. matrix multiplies) so that
we can do generic optimizations on it. Having support for SSA from the
beginning was also a must, because lots of optimisations that we really
want for cleaning up DX9-translated games are either a lot easier in or
made possible by SSA. I also made the decision for it to be typeless,
because that's what the cool kids are all doing :) and for a
lower-level, flat IR it seemed like the thing to do (it could have gone
either way, though). So the key design points of NIR (pronounced either
like "near" as in "NIR is near!" or to rhyme with "burr") are:

* It's flat (no expression trees)

* It's typeless

* Modifiers (abs, negate, saturate), swizzles, and write masks are part
of ALU instructions

* It includes enough GLSL-like things (variables that you can load from
or store to, function calls) to be hardware-agnostic (although we don't
have a way to represent matrix multiplies right now, but that could
easily be added) to be able to do optimizations at a high level, while
having lowering passes that convert variables to registers and
input/output/uniform loads/stores that will open up more opportunities
for optimization and save memory while being more hardware-specific.

* Control flow consists of a tree of if statements and loops, like in
GLSL IR, except the leaves of the tree are now basic blocks instead of
instructions. Also, each basic block keeps track of its successors and
predecessors, so the control flow graph is explicit in the IR.

* SSA is natively supported, and SSA uses point directly to the SSA
definition, which means that the use-def chains are always there, and
def-use chains are kept by tracking the set of all uses for each
definition.

* It's written in C.

(see the README in patch 3 and nir.h in patch 4 for more details)

Some things that are missing or could be improved:

* There's currently no alias tracking for inputs, outputs, and uniforms.
This is especially important for uniforms because we don't pack them
like we pack inputs and outputs.

* We need a way to represent matrix multiplies so that we can do
matrix-flipping optimizations in NIR (currently GLSL IR does this for
us).

* I'm not entirely happy about how we represent loads and stores in the
IR. Right now, they're intrinsics, but that means we need a different
intrinsic for each size and combination of arguments (indirect vs. not
indirect, etc.) and we might run into a combinatorial explosion problem
in the future, so we might need to make separate load/store instructions
like what I did for textures.

* Right now, we only have a pass that lowers variables for scalar
backends. We need to write a similar pass for vector backends that uses
std140 packing or something similar, as well as porting
lower_ubo_reference to NIR and changing it to output offsets in the
hardware-native units instead of bytes.

* We'll need to write a pass that splits up vector expressions for
scalar backends.

[...]

However, let's face it, gallium is stuck with TGSI

forever. Switching to another IR in Gallium is insane (unless you can
rewrite all drivers and state trackers for it - let's be realistic, it
just won't happen). The next GL NG IR, whatever it is going to be,
will be just as important as the IR of ARB_vertex_program. TGSI will
continue to be the major IR whether we like or not.





No, switching to another IR in Gallium is not insane if approached the right
way.   We already allow multiple IRs in gallium, so all it take to move to
another IR is to having helper modules to do the translation:

- a pipe driver helper module that would translate new IR into TGSI, for the
sake of old pipe drivers


- a state tracker helper module that would translate TGSI into the new IR,
for the sake of old state trackers.


Once these are in place, all development effort to go on to
improving/leveraging the new IR.  We could deprecate TGSI when it would have
few users.


Also, switching to LLVM, NIR, or some other IR that uses SSA (or at
least modifying TGSI to support it) seems like something that's really
necessary for the Gallium folks. Soon, considering most backends
already use SSA in one form or another, the situation will look like:

GLSL IR -> NIR -> NIR with SSA -> optimizations -> NIR without SSA ->
TGSI -> backend without SSA -> backend with SSA

So backends would have to duplicate the into-SSA logic and every
shader would have to pay the penalty of being converted out of and
then back into SSA thanks to TGSI not supporting it.





I also want to highlight there are two kinds of "IR".


a) one thing is a shader IR that communicates a shader between an interface
(be it application interface

        High-level lang.             IR               GPU code
   App -----------------> front-end ----> back-end ---------->  GPU

b) another is a shader IR that is meant to faciliate code transformations
(ie optimizations):

       opt. pass     opt. pass
    IR ---------> IR ---------> IR --> ....


Gallium needs a), but not necessarily b).  An optimizing compiler needs b)
internally but necessarily a).

An IR that achieves both a) and b) is not impossible, but it is a more
difficult trade-off.


Indeed. NIR is definitely in b), but personally I think it might be a
good idea for Gallium to start accepting NIR as well as TGSI so that
drivers can do their own optimizations/lowering on it and avoid having
to do it in their own IR. But that's another discussion, and we're
thinking about step 15 here when we're only at step 2.

Yep, I don't have an opinion on the suitability of NIR for gallium yetbut I agree with you here: if the state trackers speak SSA, and the pipedrivers speak SSA, then we will want Gallium to speak SSA too.

Modifying TGSI to represent SSA unmolested is relativelystraightforward: just add a new register kind for the SSA values (whereeach index can only be written once, indirect addressing is forbiden),plus add a new PHI opcode.

For drivers which can't cope with SSA TGSI we'd add a simple TGSItransformation pass would detect the scope of the SSA variables andassign temporaries to each.

Nowadays we get away with current TGSI, but in the future when we want asingle IR for both graphics and compute, I suspect we'll end upgravitating towards LLVM IR due to the large ecosystem behind it, andspite some difficulties there is enough evidence (e.g, GlassyMesa andRadeon drivers) that it can be made to work for graphics.

But even if the future is LLVM, by moving away from a tree IR into aSSA-form IR, NIR seems a step forward, so I personally don't haveobjections.



Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [RFC PATCH 00/16] A new IR for Mesa

Reply via email to