On Tue, Oct 22, 2013 at 3:05 AM, Eric Anholt <e...@anholt.net> wrote: > Chia-I Wu <olva...@gmail.com> writes: > >> On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner <matts...@gmail.com> wrote: >>> On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt <e...@anholt.net> wrote: >>>> Previously, the best thing we had was to schedule the things unblocked by >>>> the current instruction, on the hope that it would be consuming two values >>>> at the end of their live intervals while only producing one new value. >>>> Sometimes that wasn't the case. >>>> >>>> Now, when an instruction is the first user of a GRF we schedule (i.e. it >>>> will probably be the virtual_grf_def[] instruction after computing live >>>> intervals again), penalize it by how many regs it would take up. When an >>>> instruction is the last user of a GRF we have to schedule (when it will >>>> probably be the virtual_grf_end[] instruction), give it a boost by how >>>> many regs it would free. >>>> >>>> The new functions are made virtual (only 1 of 2 really needs to be >>>> virtual) because I expect we'll soon lift the pre-regalloc scheduling >>>> heuristic over to the vec4 backend. >>>> >>>> shader-db: >>>> total instructions in shared programs: 1512756 -> 1511604 (-0.08%) >>>> instructions in affected programs: 10292 -> 9140 (-11.19%) >>>> GAINED: 121 >>>> LOST: 38 >>>> >>>> Improves tropics performance at my current settings by 4.50602% +/- >>>> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on >>>> GLB2.7 (n=11). >>>> >>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 >>>> --- >>> >>> I think we're on the right track by considering register pressure when >>> scheduling, but one aspect we're not considering is simply how many >>> registers we think we're using. >>> >>> If I understand correctly, the pre-register allocation wants to >>> shorten live intervals as much as possible which reduces register >>> pressure but at the cost of larger stalls and less instruction level >>> parallelism. We end up scheduling things like >>> >>> produce result 4 >>> produce result 3 >>> produce result 2 >>> produce result 1 >>> use result 1 >>> use result 2 >>> use result 3 >>> use result 4 >>> >>> (this is why the MRF writes for the FB write are always done in the >>> reverse order) >> In this example, it will actually be >> >> produce result 4 >> use result 4 >> produce result 3 >> use result 3 >> produce result 2 >> use result 2 >> produce result 1 >> use result 1 >> >> and post-regalloc will schedule again to something like >> >> produce result 4 >> produce result 3 >> produce result 2 >> produce result 1 >> use result 4 >> use result 3 >> use result 2 >> use result 1 >> >> The pre-regalloc scheduling attempts to consume the results as soon as >> they are available. >> >> FB write is done in reverse order because, when a result is available, >> its consumers are scheduled in reverse order. The epilog of fragment >> shaders is usually like this: >> >> placeholder_halt >> mov m1, g1 >> mov m2, g2 >> mov m3, g3 >> mov m4, g4 >> send >> >> MOVs depend on placeholder_halt, and send depends on MOVs. The >> scheduler will schedule it as follows: >> >> placeholder_halt >> mov m4, g4 >> mov m3, g3 >> mov m2, g2 >> mov m1, g1 >> send >> >> The order can be corrected with the change proposed here >> >> http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html >> >> But there is no point for making the change the current heuristic for >> pre-regalloc is to be reworked. > > Flipping the order in which we prefer ties (on betterthanlifo-2): > > commit 11a511576e465f02875f39c452561775a97416a1 > Author: Eric Anholt <e...@anholt.net> > Date: Mon Oct 21 11:45:53 2013 -0700 > > otherway > > diff --git a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > b/src/mesa/ > index 9a480b4..b123015 100644 > --- a/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > +++ b/src/mesa/drivers/dri/i965/brw_schedule_instructions.cpp > @@ -1049,9 +1049,9 @@ > fs_instruction_scheduler::choose_instruction_to_schedule() > * it's the first use of a GRF, reduce its score since it means it > * should be increasing register pressure. > */ > - for (schedule_node *node = (schedule_node *)instructions.get_tail(); > - node != instructions.get_head()->prev; > - node = (schedule_node *)node->prev) { > + for (schedule_node *node = (schedule_node *)instructions.get_head(); > + node != instructions.get_head()->next; > + node = (schedule_node *)node->next) { > schedule_node *n = (schedule_node *)node; > fs_inst *inst = (fs_inst *)n->inst; > > gives: > > total instructions in shared programs: 1544638 -> 1546794 (0.14%) > instructions in affected programs: 7163 -> 9319 (30.10%) > GAINED: 16 > LOST: 289 > > with massive spilling on tropics, and a bit on lightsmark and csgo. Children of a schedule_node also need to be pushed to the head in reverse order
for (int i = chosen->child_count - 1; i >= 0; i--) { ...; if (child->parent_count == 0) instructions.push_head(child); } so that when you loop from head, you still get LIFO. -- o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev