On Tue, Sep 23, 2014 at 11:43 AM, Matt Turner <matts...@gmail.com> wrote:
> On Sat, Sep 20, 2014 at 10:22 AM, Jason Ekstrand <ja...@jlekstrand.net> > wrote: > > This series does a bunch of refactoring of the i965 fs backend IR to add > > concepts of register width and instruction execution size. There's more > to > > be done yet, but this gets us most of the way there. It also removes the > > assumption that scalar values are always 1 register in SIMD8 and 2 > > registers in SIMD16. In particular, we get the following: > > > > 1) No more assumption about everything being 1 register. This allows us > > to allocate odd numbers of registers in SIMD16 which is needed for > some > > payloads. Also, it should make implementing fp64 much easier because > > we can now sanely registers of size 2 in SIMD8 and size 4 in SIMD16. > > There's a little more work to be don there, but this should take care > > of a lot of it. > > > > 2) We can now do other instruction widths with relative ease. The > > compiler now detects, based on register widths, the execution size of > > the instruction and passes it down to the generator. One example of > > this is the patches in this series for UNTYPED_ATOMIC and > > UNTYPED_SURFACE_READ where part of setting up the payload is to do an > > 8-wide move to fill a register with 0 and then a 1-wide move to set > one > > particular component. We can now simply do this at the fs level and > it > > will be get translated down to the correct assembly and properly > > handled by the compiler optimizations. There is more work to be done > > here at the generator level, but this series is already long enough > > > > 3) Thanks to the above mentioned things, we can easily do send from GRF > > for FB writes. One of the major blockers here before was that the > > beginning of the FB write message was anywhere between 0 and 4 > > registers regardless of whether you are in SIMD8 or SIMD16. Due to > the > > implicit register doubling in SIMD16, it would have been a real pain > to > > implement this properly. Now, it's trivial. > > > > I could go on about other changes, but those are the major ones. > > This all sounds great, Jason. I'm really happy with how you split the > patches out. It made reviewing this amazingly easy. > > I'm made a relatively quick review, and it looks good to me. (I'm > operating under the assumption that what you said below about piglit > passing is indicative of there not being bugs :) Heh... If only that were true... I seem to have killed ILK and I'm not sure why. I'm going to look into that today and see if I can get it patched up. > I've sent a bunch of > smallish comments. Nothing major. I guess patch 41 is in flux though > based on Connor's review. > I think Ken's comment demonstrates that it's not an issue. I'm convinced at any rate. Not sure If I can count on Connor getting back to me right away either. :-) > I suppose the next steps are to sort out the preliminary 12 patch > series (which I think you said you were going to revise?). I think the > ball's in your court there. > Yeah, I need to come up with something better for compact_virtual_grfs and I think there was another thing or two to clean up. Also, the split_virtual_grfs patch still needs a good review. You looked at it and gave me a non-comment. > > So, you can apply my R-b to the first 40 patches with comments > addressed. When this series goes in, I think we should preserve the > commit info of the squash patches (i.e., actually squash instead of > fixup), like in commit 79d77b38. > > > The requisite Shader DB results: > > > > total instructions in shared programs: 4999994 -> 4971746 (-0.56%) > > instructions in affected programs: 959392 -> 931144 (-2.94%) > > GAINED: 138 > > LOST: 71 > > Any ideas about the lost programs? > I think some (may be 12 or so) are because if an optimization pass that's failing (there were a few programs that grew by 1 or 2 instructions). The other 60 or so seem to be because freedom in register allocation isn't always a good thing. Prior to doing send-from-GRF for FB writes and this last rebase, I had only 4 programs in all of shader-db that had any difference in the number of instructions, so I think I was generating basically identical programs except for register allocation. However, I gained some programs and lost about 65. I think this is a case of those programs being right on the edge of what our allocator could do and tweaking things pushed them over the edge. :-( --Jason
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev