st: glsl_to_tgsi: improved temp-reg lifetime estimation

Nicolai Hähnle Mon, 12 Jun 2017 11:53:09 -0700

On 12.06.2017 14:34, Gert Wollny wrote:

Am Montag, den 12.06.2017, 12:28 +0200 schrieb Nicolai Hähnle:

On 10.06.2017 01:15, Gert Wollny wrote:

Plenty of style issues aside, can you explain where and why you get
tighter lifetimes?


In the original code if a temporary is used within a loop it gets the
whole life time of the loop assigned.

With this patch I track in more detail where a temporary is accesses
and base the lifetime on this: For instance, if a variable is first
unconditionally written and later read for the last time in the same
scope (loop, if, or switch branch), then the lifetime can be restricted
to that first-written - last-read range.

The code gets more complex because it tries to resolve this also for
nested scopes, and one also has to take care about whether a variable
is written only conditionally within a loop, or conditionally read
before it is written (in the source code sense, but not in the program
flow sense).

Shaders that profit from this better lifetime estimation are the ones
that have many short living values within long loops, think

for () {
   float4 t[1] = f(in2);
   float4 t[2] = g(in1);
   float4 t[3] = op(t[1], t[2]);
    ...
   sum += t[200];
}

Here the old code would keep all of the t[i] alive for the whole loop,

and in fact with the GpuTest Piano benchmark I have seen a shader with

2000 temporaries where many were used in a loop but only required for

two or three lines so that my code could merge them to less then 100
temporaries, and this made it possible for the tgsi to bytecode layer
in r600g to actually translate the shader.

Okay. I think you should seriously re-think your algorithm in a way thatmakes it a more natural evolution from the algorithm that's already there.

Basically, the current algorithm tracks (first_write, last_read), sothink about what you need to track in order to obtain a single-passalgorithm that computes lifetime (first, last) for every temporary. Ithink the following should do it for a first cut:


   struct st_calculate_lifetime_state {
      /* First and last instruction that has this temporary */
      unsigned first;
      unsigned last;

      /* First instruction of the outer-most in-active loop that
       * contains this temporary. (A loop is active if we're
       * currently processing instructions in it.)
      unsigned loop_first;

      /* Position of a read without preceding dominating write. */
      unsigned undef_read[4];

      /* First write in the program that is dominating our
       * current position, per channel.
       */
      unsigned first_dominating_write[4];
   };

In addition, you need to keep a stack of active scopes (loops and ifs),but you really only need to remember the start of the scope (and forloops, probably the position of the first BREAK).

Here's a sketch of the "state machine" that you need to run whiletraversing the program, assuming no BRK and CONT:


Init: last = 0, everything else = ~0

These are updates on individual variables on use:

On any use (source or dest):
- if first > cur_pos, set first = loop_first = cur_pos
- if loop_first < first, set first = loop_first
- update last

On use as source:

- if first_dominating_write > cur_pos and undef_read > cur_pos, setundef_read = cur_pos


On use as dest:
- if first_dominating_write > cur_pos, set first_dominating_write = cur_pos

These are updates of all temporaries on scope change:

On ENDLOOP, for all temps:
- if loop_first > start of loop, set loop_first = start of loop
- first < start of loop, update last to end of loop

- if undef_read between start and end of loop: set first = MIN(first,start of loop) and last = end of loop

- if first_dominating_write < end of loop, set undef_read = ~0

On ELSE and ENDIF, for all temps:

- if first_dominating_write > start of scope, set first_dominating_write= ~0

I'm not sure right now whether BREAK / CONT need special treatment atall. I think what you need is:


On ENDLOOP:

- if first_dominating_write is between the first BREAK in the loop andthe end of the loop, set first_dominating_write = ~0

And for CONT, you probably don't really need anything, because CONTscannot make you skip code forever.


What this state machine doesn't yet cover is

IF ..
  MOV TEMP[0], ...
ELSE
  MOV TEMP[0], ...
ENDIF

Still, I'd start with it and see whether you need to cover that case.

And even that case can probably be dealt with in a fairly efficient andpragmatic way. The idea is to keep track of the nesting level of IFs,plus an


   uint32 dominating_write_in_true_block[4];

per temp. Then:

On ELSE:

- if first_dominating_write < cur_pos, set the bit corresponding to thecurrent nesting level in dominating_write_in_true_block


On ENDIF:

- don't reset first_dominating_write if the bit corresponding to thecurrent nesting level in dominating_write_in_true_block is set

- unconditionally clear that bit

It won't cover cases with nesting level > 32, but do we really care?

I hope I didn't miss anything, because after all this is admittedlysubtle stuff. Still, I think this kind of state-machine approach shouldwork and allow you to avoid *lots* of allocations and pointer-chasing.


Cheers,
Nicolai


Best,
Gert

PS: Regarding style, I am fully aware that I have to iterate over this
code a few more times. I tried to adhere to the way the existing code
represents itself to me, but I'm happy to listen to any advise I can
get.

In any case, I though it might be best to send this patch out now for
discussion. Now, with the unit tests in place I will rework it and
focus also more on style questions. One thing that comes up immediately
up is that I will try to reduce the use of dynamically allocated
memory, since 60% of the run time of my code is with memory allocation
and de-allocation.



--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 0/3] [RFC] mesa/st: glsl_to_tgsi: improved temp-reg lifetime estimation

Reply via email to