Chia-I Wu <olva...@gmail.com> writes: > On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner <matts...@gmail.com> wrote: >> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt <e...@anholt.net> wrote: >>> Previously, the best thing we had was to schedule the things unblocked by >>> the current instruction, on the hope that it would be consuming two values >>> at the end of their live intervals while only producing one new value. >>> Sometimes that wasn't the case. >>> >>> Now, when an instruction is the first user of a GRF we schedule (i.e. it >>> will probably be the virtual_grf_def[] instruction after computing live >>> intervals again), penalize it by how many regs it would take up. When an >>> instruction is the last user of a GRF we have to schedule (when it will >>> probably be the virtual_grf_end[] instruction), give it a boost by how >>> many regs it would free. >>> >>> The new functions are made virtual (only 1 of 2 really needs to be >>> virtual) because I expect we'll soon lift the pre-regalloc scheduling >>> heuristic over to the vec4 backend. >>> >>> shader-db: >>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%) >>> instructions in affected programs: 10292 -> 9140 (-11.19%) >>> GAINED: 121 >>> LOST: 38 >>> >>> Improves tropics performance at my current settings by 4.50602% +/- >>> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on >>> GLB2.7 (n=11). >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 >>> --- >> >> I think we're on the right track by considering register pressure when >> scheduling, but one aspect we're not considering is simply how many >> registers we think we're using. >> >> If I understand correctly, the pre-register allocation wants to >> shorten live intervals as much as possible which reduces register >> pressure but at the cost of larger stalls and less instruction level >> parallelism. We end up scheduling things like >> >> produce result 4 >> produce result 3 >> produce result 2 >> produce result 1 >> use result 1 >> use result 2 >> use result 3 >> use result 4 >> >> (this is why the MRF writes for the FB write are always done in the >> reverse order) > In this example, it will actually be > > produce result 4 > use result 4 > produce result 3 > use result 3 > produce result 2 > use result 2 > produce result 1 > use result 1 > > and post-regalloc will schedule again to something like > > produce result 4 > produce result 3 > produce result 2 > produce result 1 > use result 4 > use result 3 > use result 2 > use result 1 > > The pre-regalloc scheduling attempts to consume the results as soon as > they are available. > > FB write is done in reverse order because, when a result is available, > its consumers are scheduled in reverse order. The epilog of fragment > shaders is usually like this: > > placeholder_halt > mov m1, g1 > mov m2, g2 > mov m3, g3 > mov m4, g4 > send > > MOVs depend on placeholder_halt, and send depends on MOVs. The > scheduler will schedule it as follows: > > placeholder_halt > mov m4, g4 > mov m3, g3 > mov m2, g2 > mov m1, g1 > send > > The order can be corrected with the change proposed here > > http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html > > But there is no point for making the change the current heuristic for > pre-regalloc is to be reworked.
Flipping the order in which we prefer ties (on betterthanlifo-2): commit 11a511576e465f02875f39c452561775a97416a1 Author: Eric Anholt <e...@anholt.net> Date: Mon Oct 21 11:45:53 2013 -0700 otherway diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp b/src/mesa/ index 9a480b4..b123015 100644 --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp @@ -1049,9 +1049,9 @@ fs_instruction_scheduler::choose_instruction_to_schedule() * it's the first use of a GRF, reduce its score since it means it * should be increasing register pressure. */ - for (schedule_node *node = (schedule_node *)instructions.get_tail(); - node != instructions.get_head()->prev; - node = (schedule_node *)node->prev) { + for (schedule_node *node = (schedule_node *)instructions.get_head(); + node != instructions.get_head()->next; + node = (schedule_node *)node->next) { schedule_node *n = (schedule_node *)node; fs_inst *inst = (fs_inst *)n->inst; gives: total instructions in shared programs: 1544638 -> 1546794 (0.14%) instructions in affected programs: 7163 -> 9319 (30.10%) GAINED: 16 LOST: 289 with massive spilling on tropics, and a bit on lightsmark and csgo.
pgpD2js8_JPZE.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev