Patches 1 through 4 are Reviewed-by: Ian Romanick <ian.d.roman...@intel.com>
I'll try to make it through the rest tomorrow. I have skimmed them, and it looks mostly okay. I thing a good follow-up patch will be to pull a bunch of the new stuff out to a new file link_varyings.cpp or something. linker.cpp is getting a bit out of control. ~2700 lines in one file...
On 12/11/2012 03:09 PM, Paul Berry wrote:
This patch series adds varying packing to Mesa, so that we can handle varyings composed of things other than vec4's without using up extra varying components. For the initial implementation I've chosen a strategy that operates exclusively at the GLSL IR level, so that it doesn't require the cooperation of the driver back-ends. This means that varying packing should be immediately useful for all drivers. However, there are some types of varying packing that can't be done using GLSL IR alone (for example, packing a "noperspective" varying and a "smooth" varying together), but should be possible on some drivers with a small amount of back-end work. I'm deferring that work for a later patch series. Also, packing of floats and ints together into the same "flat varying" should be possible for drivers that implement ARB_shader_bit_encoding--I'm also deferring that for a later patch series. The strategy is as follows: - Before assigning locations to varyings, we sort them into "packing classes" based on base type and interpolation mode (this is to ensure that we don't try to pack floats with ints, or smooth with flat, for example). - Within each packing class, we sort the varyings based on the number of vector elements. Vec4's (as well as matrices and arrays composed of vec4's) are packed first, then vec2's, then scalars, since this allows us to align them all to their natural alignment boundary, so we avoid the performance penalty of "double parking" a varying across two varying slots. Vec3's are packed last, double parking them if necessary. - For any varying slot that doesn't contain exactly one vec4, we generate GLSL IR to manually pack/unpack the varying in the shader. For instance, the following fragment shader: varying vec2 a; varying vec2 b; varying vec3 c; varying vec3 d; main() { ... } would get rewritten as follows: varying vec4 packed0; varying vec4 packed1; varying vec4 packed2; vec2 a; vec2 b; vec3 c; vec3 d; main() { a = packed0.xy; b = packed0.zw; c = packed1.xyz; d.x = packed1.w; // d is "double parked" across slots 1 and 2 d.yz = packed2.xy; ... } This GLSL IR is generated by a lowering pass, so that in the future we will have the option of disabling it for driver back-ends that are capable of natively understanding the packed varying format. - Finally, the linker code to handle transform feedback is modified to account for varying packing (e.g. by feeding back just a subset of the components of a varying slot rather than the entire varying slot). Fortunately transform feedback already has the infrastructure necessary to do this, since it was needed in order to implement glClipDistance. I believe this is enough to be useful for the vast majority of programs, and to get us passing the GLES3 conformance tests. Additional improvements, which I'm planning to defer to later patch series, include: - Allow uints and ints to be packed together in the same varying slot. This should be possible on all back-ends, since ints and uints may be interconverted without losing information. - On back-ends that support ARB_shader_bit_encoding, allow floats and ints to be packed together in the same varying slot, since ARB_shader_bit_encoding allows floating-point values to be encoded into ints without losing information. - On back-ends that can mix interpolation modes within a single varying slot, allow additional packing, with help from the driver back-end. For instance, i965 gen6 and above can in principle mix together all interpolation modes except for "flat" within a single varying slot, if we do a hopefully small amount of back-end work. - Allow a driver back-end to advertise a larger number of varying components to the linker than it advertises to the client program--this will allow us to ensure that varying packing *never* fails. For example, on i965 gen6 and above, after the above improvements are made, we should be able to pack any possible combination of varyings with a maximum waste of 3 varying components. That means, for example, that if the i965 driver advertises 17 varying slots to the linker (== 68 varying components), but advertises only 64 varying components to the the client program, then varying packing will always succeed. Note: I also have a new piglit test that exercises this code; I'll be publishing that to the Piglit list ASAP. [PATCH 01/10] glsl/lower_clip_distance: Update symbol table. [PATCH 02/10] glsl/linker: Always invalidate shader ins/outs, even in corner cases. [PATCH 03/10] glsl/linker: Make separate ir_variable field to mean "unmatched". [PATCH 04/10] glsl: Create a field to store fractional varying locations. [PATCH 05/10] glsl/linker: Defer recording transform feedback locations. [PATCH 06/10] glsl/linker: Subdivide the first phase of varying assignment. [PATCH 07/10] glsl/linker: Sort varyings by packing class, then vector size. [PATCH 08/10] glsl: Add a lowering pass for packing varyings. [PATCH 09/10] glsl/linker: Pack within compound varyings. [PATCH 10/10] glsl/linker: Pack between varyings. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev