Kenneth Graunke <kenn...@whitecape.org> writes: > On 10/15/2012 04:06 PM, Eric Anholt wrote: >> Improves performance of the Lightsmark penumbra shadows scene by 15.7% +/- >> 1.0% (n=15), by eliminating register spilling. (tested by smashing the list >> of >> scenes to have all other scenes have 0 duration -- includes additional >> rendering of scene description text that normally doesn't appear in that >> scene) >> --- >> src/mesa/drivers/dri/i965/brw_fs.h | 2 + >> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 164 >> ++++++++++++++++++--- >> 2 files changed, 147 insertions(+), 19 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h >> b/src/mesa/drivers/dri/i965/brw_fs.h >> index a71783c..ad717c9 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs.h >> +++ b/src/mesa/drivers/dri/i965/brw_fs.h >> @@ -235,6 +235,8 @@ public: >> void assign_urb_setup(); >> bool assign_regs(); >> void assign_regs_trivial(); >> + void setup_payload_interference(struct ra_graph *g, int >> payload_reg_count, >> + int first_payload_node); >> int choose_spill_reg(struct ra_graph *g); >> void spill_reg(int spill_reg); >> void split_virtual_grfs(); >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> index 7b778d6..bd9789f 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> @@ -163,27 +163,154 @@ brw_alloc_reg_set(struct brw_context *brw, int >> reg_width, int base_reg_count) >> /** >> * Sets up interference between thread payload registers and the virtual >> GRFs >> * to be allocated for program temporaries. >> + * >> + * We want to be able to reallocate the payload for our virtual GRFs, >> notably >> + * because the setup coefficients for a full set of 16 FS inputs takes up 8 >> of >> + * our 128 registers. >> + * >> + * The layout of the payload registers is: >> + * >> + * 0..nr_payload_regs-1: fixed function setup (including bary coordinates). >> + * nr_payload_regs..nr_payload_regs+curb_read_lengh-1: uniform data >> + * nr_payload_regs+curb_read_lengh..first_non_payload_grf-1: setup >> coefficients. >> + * >> + * And we have payload_node_count nodes covering these registers in order >> + * (note that in 16-wide, a node is two registers). >> */ >> -static void >> -brw_setup_payload_interference(struct ra_graph *g, >> - int payload_reg_count, >> - int first_payload_node, >> - int reg_node_count) >> +void >> +fs_visitor::setup_payload_interference(struct ra_graph *g, >> + int payload_node_count, >> + int first_payload_node) >> { >> - for (int i = 0; i < payload_reg_count; i++) { >> - /* Mark each payload reg node as being allocated to its physical >> register. >> + int reg_width = c->dispatch_width / 8; >> + int loop_depth = 0; >> + int loop_end_ip = 0; >> + >> + int payload_last_use_ip[payload_node_count]; >> + memset(payload_last_use_ip, 0, sizeof(payload_last_use_ip)); >> + int ip = 0; >> + foreach_list(node, &this->instructions) { >> + fs_inst *inst = (fs_inst *)node; >> + >> + switch (inst->opcode) { >> + case BRW_OPCODE_DO: >> + loop_depth++; >> + >> + /* Since payload regs are deffed only at the start of the shader >> + * execution, any uses of the payload within a loop mean the live >> + * interval extends to the end of the outermost loop. Find the ip >> of >> + * the end now. >> + */ >> + if (loop_depth == 1) { >> + int scan_depth = loop_depth; >> + int scan_ip = ip; >> + for (fs_inst *scan_inst = (fs_inst *)inst->next; >> + scan_depth > 0; >> + scan_inst = (fs_inst *)scan_inst->next) { >> + switch (scan_inst->opcode) { >> + case BRW_OPCODE_DO: >> + scan_depth++; >> + break; >> + case BRW_OPCODE_WHILE: >> + scan_depth--; >> + break; >> + default: >> + break; >> + } >> + scan_ip++; >> + } >> + loop_end_ip = scan_ip; >> + } >> + break; >> + case BRW_OPCODE_WHILE: >> + loop_depth--; >> + break; >> + default: >> + break; >> + } > > Wow, it's unfortunate that you have to do this. Essentially, for each > instruction in a loop, you walk through all the instructions, to the end > of the loop. That's big O(fail). :(
Huh? This is "for the top-level loop instruction, count to the end of that loop". I mean, we could keep ip in the instructions, but then you get to update it all over when you inst->remove().
pgpF5XVZ8igUL.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev