On Thu, Oct 17, 2013 at 3:24 AM, Matt Turner <matts...@gmail.com> wrote: > On Mon, Oct 14, 2013 at 4:14 PM, Eric Anholt <e...@anholt.net> wrote: >> Previously, the best thing we had was to schedule the things unblocked by >> the current instruction, on the hope that it would be consuming two values >> at the end of their live intervals while only producing one new value. >> Sometimes that wasn't the case. >> >> Now, when an instruction is the first user of a GRF we schedule (i.e. it >> will probably be the virtual_grf_def[] instruction after computing live >> intervals again), penalize it by how many regs it would take up. When an >> instruction is the last user of a GRF we have to schedule (when it will >> probably be the virtual_grf_end[] instruction), give it a boost by how >> many regs it would free. >> >> The new functions are made virtual (only 1 of 2 really needs to be >> virtual) because I expect we'll soon lift the pre-regalloc scheduling >> heuristic over to the vec4 backend. >> >> shader-db: >> total instructions in shared programs: 1512756 -> 1511604 (-0.08%) >> instructions in affected programs: 10292 -> 9140 (-11.19%) >> GAINED: 121 >> LOST: 38 >> >> Improves tropics performance at my current settings by 4.50602% +/- >> 2.60694% (n=5). No difference on Lightsmark (n=5). No difference on >> GLB2.7 (n=11). >> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70445 >> --- > > I think we're on the right track by considering register pressure when > scheduling, but one aspect we're not considering is simply how many > registers we think we're using. > > If I understand correctly, the pre-register allocation wants to > shorten live intervals as much as possible which reduces register > pressure but at the cost of larger stalls and less instruction level > parallelism. We end up scheduling things like > > produce result 4 > produce result 3 > produce result 2 > produce result 1 > use result 1 > use result 2 > use result 3 > use result 4 > > (this is why the MRF writes for the FB write are always done in the > reverse order) In this example, it will actually be
produce result 4 use result 4 produce result 3 use result 3 produce result 2 use result 2 produce result 1 use result 1 and post-regalloc will schedule again to something like produce result 4 produce result 3 produce result 2 produce result 1 use result 4 use result 3 use result 2 use result 1 The pre-regalloc scheduling attempts to consume the results as soon as they are available. FB write is done in reverse order because, when a result is available, its consumers are scheduled in reverse order. The epilog of fragment shaders is usually like this: placeholder_halt mov m1, g1 mov m2, g2 mov m3, g3 mov m4, g4 send MOVs depend on placeholder_halt, and send depends on MOVs. The scheduler will schedule it as follows: placeholder_halt mov m4, g4 mov m3, g3 mov m2, g2 mov m1, g1 send The order can be corrected with the change proposed here http://lists.freedesktop.org/archives/mesa-dev/2013-October/046570.html But there is no point for making the change the current heuristic for pre-regalloc is to be reworked. > > Take the main shader from FillTestC24Z16 in GLB2.5 or 2.7 as an > example. Before texture-grf we serialized the eight texture sends. > After that branch landed, we scheduled the code much better, leading > to a performance improvement. > > This patch causes us again to serialize the 8 texture ops in > GLB25_FillTestC24Z16, like we did before texture-from-grf. It reduces > performance from 7.0 billion texels/sec to ~6.5 on IVB. > > The shader in question is structured, prior to scheduling as > > 16 PLNs to interpolate the texture coordinates > - 10 registers consumed, 16 results produced > 8 TEX > - 16 registers consumed, 32 results produced > 28 ADDs to sum the texture results into gl_FragColor. > - 32 registers consumed, 4 results produced > FB write. > - 4 registers consumed > > Even doubling these numbers for SIMD16 we don't spill. There's no need > to reduce live ranges and therefore ILP for this shader. > > Can we accurately track the number of registers in use and decide what > to do based on that? > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev -- o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev