One more quick note. If you find it nicer, the whole thing can be found here:
http://cgit.freedesktop.org/~jekstrand/mesa/tree/?h=kill-mrf-v1 On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > This series does a bunch of refactoring of the i965 fs backend IR to add > concepts of register width and instruction execution size. There's more to > be done yet, but this gets us most of the way there. It also removes the > assumption that scalar values are always 1 register in SIMD8 and 2 > registers in SIMD16. In particular, we get the following: > > 1) No more assumption about everything being 1 register. This allows us > to allocate odd numbers of registers in SIMD16 which is needed for some > payloads. Also, it should make implementing fp64 much easier because > we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16. > There's a little more work to be don there, but this should take care > of a lot of it. > > 2) We can now do other instruction widths with relative ease. The > compiler now detects, based on register widths, the execution size of > the instruction and passes it down to the generator. One example of > this is the patches in this series for UNTYPED_ATOMIC and > UNTYPED_SURFACE_READ where part of setting up the payload is to do an > 8-wide move to fill a register with 0 and then a 1-wide move to set one > particular component. We can now simply do this at the fs level and it > will be get translated down to the correct assembly and properly > handled by the compiler optimizations. There is more work to be done > here at the generator level, but this series is already long enough > > 3) Thanks to the above mentioned things, we can easily do send from GRF > for FB writes. One of the major blockers here before was that the > beginning of the FB write message was anywhere between 0 and 4 > registers regardless of whether you are in SIMD8 or SIMD16. Due to the > implicit register doubling in SIMD16, it would have been a real pain to > implement this properly. Now, it's trivial. > > I could go on about other changes, but those are the major ones. > > The requisite Shader DB results: > > total instructions in shared programs: 4999994 -> 4971746 (-0.56%) > instructions in affected programs: 959392 -> 931144 (-2.94%) > GAINED: 138 > LOST: 71 > > There are some shaders that are hurt by 1 or 2 instructions. It could > simply be send-from-grf, but prior to this last rebase, I don't remember > there being any hurt programs. I'm going to look into it. > > Regarding Piglit: > > * On HSW, Every commit except for ones immediately followed by something > labled SQUASH pass. (Except for glsl-routing and timestamp-get which > are flaky). > > * On SNB and Gen4, the end of the series along with important > intermediate points, such as changing GEN5 texturing or varying pull > constant loads, pass. > > I did have a hang on ILK, but I'm pretty sure that was due to bad COMPR4 > code which I have since removed. I'll try to get that working and added > back in later. That said, that's an optimization and not required, so we > can leave it for now. > > Happy Reviewing! > --Jason Ekstrand > > Jason Ekstrand (41): > i965/brw_reg: Add a firsthalf function and use it in the generator > i965/fs: A little harmless refactoring of register_coalesce > i965/fs: Add a concept of a width to fs_reg > i965/fs: Make half() divide the register width by 2 and use it more > i965/fs: Handle printing of registers better. > i965/fs_reg: Allocate double the number of vgrfs in SIMD16 mode > SQUASH: i965/fs: Use the register width when applying offsets > SQUASH: i965/fs: Change regs_read to be in hardware registers > SQUASH: i965/fs: Change regs_written to be actual hardware registers > SQUASH: i965/fs: Properly handle register widths in LOAD_PAYLOAD > SQUASH: i965/fs: Handle register widths in demote_pull_constants > SQUASH: i965/fs: Get rid of implicit register doubling in the > allocator > SQUASH: i965/fs: Reserve enough registers for PLN instructions > SQUASH: i965/fs: Make sources and destinations interfere in 16-wide > SQUASH: i965/fs: Properly handle register widths in CSE > SQUASH: i965/fs: Properly handle register widths in register_coalesce > SQUASH: i965/fs: Properly handle widths in copy propagation > SQUASH: i965/fs: Properly handle register widths in > VARYING_PULL_CONSTANT_LOAD > SQUASH: i965/fs: Properly handle register widths and odd register > sizes in spilling > SQUASH: i965/fs: Don't waste a register on texture lookups for gen >= > 7 > i965/fs: Rework GEN5 texturing code to use fs_reg and offset() > i965/fs: Fix a bug in register coalesce > i965/fs: Determine partial writes based on the destination width > i965/fs: Add an exec_size field to fs_inst > SQUASH: i965/fs: Explicitly set instruction execute size a couple of > places > SQUASH: i965/blorp: Explicitly set instruction execute sizes > i965/fs: Better guess the width of LOAD_PAYLOAD > i965/fs: Make fs_reg::effective_width take fs_inst* instead of > fs_visitor* > i965/fs: Derive force_uncompressed from instruction exec_size > i965/fs: Remove unneeded uses of force_uncompressed > i965/fs: Use instruction execution sizes to set compression state > i965/fs: Use instruction execution sizes instead of heuristics > i965/fs: Use exec_size instead of force_uncompressed in > dump_instruction > i965/fs: Use the instruction execution size directly for texture > generation > i966/fs: Add a function for getting a component of a 8 or 16-wide > register > i965/fs: Use the GRF for UNTYPED_ATOMIC instructions > i965/fs: Use the GRF for UNTYPED_SURFACE_READ instructions > i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE > instruction > i965/fs: Add split_virtual_grfs and compute_to_mrf after > lower_load_payload > i965/fs: Use the GRF for FB writes on gen >= 7 > SQUASH: i965/fs: Force a high register for the final FB write > > src/mesa/drivers/dri/i965/brw_blorp_blit_eu.cpp | 2 +- > src/mesa/drivers/dri/i965/brw_blorp_blit_eu.h | 36 +-- > src/mesa/drivers/dri/i965/brw_eu.h | 6 +- > src/mesa/drivers/dri/i965/brw_eu_emit.c | 16 +- > src/mesa/drivers/dri/i965/brw_fs.cpp | 355 > +++++++++++++++----- > src/mesa/drivers/dri/i965/brw_fs.h | 98 +++++- > .../drivers/dri/i965/brw_fs_copy_propagation.cpp | 14 +- > src/mesa/drivers/dri/i965/brw_fs_cse.cpp | 22 +- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 169 +++++----- > .../drivers/dri/i965/brw_fs_live_variables.cpp | 10 +- > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 160 ++++++--- > .../drivers/dri/i965/brw_fs_register_coalesce.cpp | 50 ++- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 356 > +++++++++++++-------- > src/mesa/drivers/dri/i965/brw_reg.h | 6 + > .../drivers/dri/i965/brw_schedule_instructions.cpp | 15 +- > src/mesa/drivers/dri/i965/brw_shader.cpp | 1 + > src/mesa/drivers/dri/i965/intel_screen.h | 5 + > 17 files changed, 904 insertions(+), 417 deletions(-) > > -- > 2.1.0 > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev