Patches 1 through 4 are

Reviewed-by: Ian Romanick <ian.d.roman...@intel.com>

I'll try to make it through the rest tomorrow. I have skimmed them, and it looks mostly okay. I thing a good follow-up patch will be to pull a bunch of the new stuff out to a new file link_varyings.cpp or something. linker.cpp is getting a bit out of control. ~2700 lines in one file...

On 12/11/2012 03:09 PM, Paul Berry wrote:
This patch series adds varying packing to Mesa, so that we can handle
varyings composed of things other than vec4's without using up extra
varying components.

For the initial implementation I've chosen a strategy that operates
exclusively at the GLSL IR level, so that it doesn't require the
cooperation of the driver back-ends.  This means that varying packing
should be immediately useful for all drivers.  However, there are some
types of varying packing that can't be done using GLSL IR alone (for
example, packing a "noperspective" varying and a "smooth" varying
together), but should be possible on some drivers with a small amount
of back-end work.  I'm deferring that work for a later patch series.
Also, packing of floats and ints together into the same "flat varying"
should be possible for drivers that implement
ARB_shader_bit_encoding--I'm also deferring that for a later patch
series.

The strategy is as follows:

- Before assigning locations to varyings, we sort them into "packing
   classes" based on base type and interpolation mode (this is to
   ensure that we don't try to pack floats with ints, or smooth with
   flat, for example).

- Within each packing class, we sort the varyings based on the number
   of vector elements.  Vec4's (as well as matrices and arrays composed
   of vec4's) are packed first, then vec2's, then scalars, since this
   allows us to align them all to their natural alignment boundary, so
   we avoid the performance penalty of "double parking" a varying
   across two varying slots.  Vec3's are packed last, double parking
   them if necessary.

- For any varying slot that doesn't contain exactly one vec4, we
   generate GLSL IR to manually pack/unpack the varying in the shader.
   For instance, the following fragment shader:

   varying vec2 a;
   varying vec2 b;
   varying vec3 c;
   varying vec3 d;
   main()
   {
     ...
   }

   would get rewritten as follows:

   varying vec4 packed0;
   varying vec4 packed1;
   varying vec4 packed2;
   vec2 a;
   vec2 b;
   vec3 c;
   vec3 d;
   main()
   {
     a = packed0.xy;
     b = packed0.zw;
     c = packed1.xyz;
     d.x = packed1.w; // d is "double parked" across slots 1 and 2
     d.yz = packed2.xy;
     ...
   }

   This GLSL IR is generated by a lowering pass, so that in the future
   we will have the option of disabling it for driver back-ends that
   are capable of natively understanding the packed varying format.

- Finally, the linker code to handle transform feedback is modified to
   account for varying packing (e.g. by feeding back just a subset of
   the components of a varying slot rather than the entire varying
   slot).  Fortunately transform feedback already has the
   infrastructure necessary to do this, since it was needed in order to
   implement glClipDistance.


I believe this is enough to be useful for the vast majority of
programs, and to get us passing the GLES3 conformance tests.


Additional improvements, which I'm planning to defer to later patch
series, include:

- Allow uints and ints to be packed together in the same varying slot.
   This should be possible on all back-ends, since ints and uints may
   be interconverted without losing information.

- On back-ends that support ARB_shader_bit_encoding, allow floats and
   ints to be packed together in the same varying slot, since
   ARB_shader_bit_encoding allows floating-point values to be encoded
   into ints without losing information.

- On back-ends that can mix interpolation modes within a single
   varying slot, allow additional packing, with help from the driver
   back-end.  For instance, i965 gen6 and above can in principle mix
   together all interpolation modes except for "flat" within a single
   varying slot, if we do a hopefully small amount of back-end work.

- Allow a driver back-end to advertise a larger number of varying
   components to the linker than it advertises to the client
   program--this will allow us to ensure that varying packing *never*
   fails.  For example, on i965 gen6 and above, after the above
   improvements are made, we should be able to pack any possible
   combination of varyings with a maximum waste of 3 varying
   components.  That means, for example, that if the i965 driver
   advertises 17 varying slots to the linker (== 68 varying
   components), but advertises only 64 varying components to the the
   client program, then varying packing will always succeed.

Note: I also have a new piglit test that exercises this code; I'll be
publishing that to the Piglit list ASAP.

[PATCH 01/10] glsl/lower_clip_distance: Update symbol table.
[PATCH 02/10] glsl/linker: Always invalidate shader ins/outs, even in corner 
cases.
[PATCH 03/10] glsl/linker: Make separate ir_variable field to mean "unmatched".
[PATCH 04/10] glsl: Create a field to store fractional varying locations.
[PATCH 05/10] glsl/linker: Defer recording transform feedback locations.
[PATCH 06/10] glsl/linker: Subdivide the first phase of varying assignment.
[PATCH 07/10] glsl/linker: Sort varyings by packing class, then vector size.
[PATCH 08/10] glsl: Add a lowering pass for packing varyings.
[PATCH 09/10] glsl/linker: Pack within compound varyings.
[PATCH 10/10] glsl/linker: Pack between varyings.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to