On 12/07/2012 02:58 PM, Eric Anholt wrote:
This came from an idea by Ben Segovia.  16-wide pixel shaders are very
important for latency hiding on i965, so we want to try really hard to
get them.  If scheduling an instruction makes some set of instructions
available, those are probably the ones that make the instruction's
result dead.  By choosing those first, we'll have a tendency to reduce
the amount of live data as opposed to creating more.

Previously, we were sometimes getting this behavior out of the
scheduler, which was what produced the scheduler's original performance
wins on lightsmark.  Unfortunately, that was mostly an accident of the
lame instruction latency information that I had, which made it
impossible to fix the actual scheduling for performance.  Now that we've
fixed the scheduling for setup for register allocation, we can safely
update the latency parameters for the final schedule.

In shader-db, we lose 37 16-wide shaders, but gain 90 new ones.  4
shaders that were spilling change how many registers spill, for a
reduction of 70/3899 instructions.
---
  .../dri/i965/brw_fs_schedule_instructions.cpp      |   49 +++++++++++++++++---
  1 file changed, 43 insertions(+), 6 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
index d48ad1e..3941eac 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp
@@ -496,13 +496,50 @@ instruction_scheduler::schedule_instructions(fs_inst 
*next_block_header)
        schedule_node *chosen = NULL;
        int chosen_time = 0;

-      foreach_list(node, &instructions) {
-        schedule_node *n = (schedule_node *)node;
+      if (post_reg_alloc) {
+         /* Of the instructions closest ready to execute or the closest to
+          * being ready, choose the oldest one.
+          */
+         foreach_list(node, &instructions) {
+            schedule_node *n = (schedule_node *)node;
+
+            if (!chosen || n->unblocked_time < chosen_time) {
+               chosen = n;
+               chosen_time = n->unblocked_time;
+            }
+         }
+      } else {
+         /* Before register allocation, we don't care about the latencies of
+          * instructions.  All we care about is reducing live intervals of
+          * variables so that we can avoid register spilling, or get 16-wide
+          * shaders which naturally do a better job of hiding instruction
+          * latency.
+          *
+          * To do so, schedule our instructions in a roughly LIFO/depth-first
+          * order: when new instructions become available as a result of
+          * scheduling something, choose those first so that our result
+          * hopefully is consumed quickly.
+          *
+          * The exception is messages that generate more than one result
+          * register (AKA texturing).  In those cases, the LIFO search would
+          * normally tend to choose them quickly (because scheduling the
+          * previous message not only unblocked the children using its result,
+          * but also the MRF setup for the next sampler message, which in turn
+          * unblocks the next sampler message).
+          */
+         for (schedule_node *node = (schedule_node *)instructions.get_tail();
+              node != instructions.get_head()->prev;
+              node = (schedule_node *)node->prev) {
+            schedule_node *n = (schedule_node *)node;
+
+            if (!chosen || chosen->inst->regs_written() > 1) {
+               chosen = n;
+               if (chosen->inst->regs_written() <= 1)
+                  break;
+            }

I don't think the if condition is necessary here.  Just doing

for (...) {
        chosen = (schedule_node *) node;
        if (chosen->inst->regs_written() <= 1)
                break;
}

should accomplish the same thing.

+         }

-        if (!chosen || n->unblocked_time < chosen_time) {
-           chosen = n;
-           chosen_time = n->unblocked_time;
-        }
+         chosen_time = chosen->unblocked_time;

It seems plausible that there could be no nodes to schedule...which means chosen would be NULL here. Perhaps just move chosen_time = chosen->unblocked_time into the if...break above.

        }

        /* Schedule this instruction. */
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to