On 04/29/2016 09:32 AM, Juan A. Suarez Romero wrote: > On Thu, 2016-04-28 at 15:29 +0200, Ian Romanick wrote: >> On 04/28/2016 01:40 PM, Antia Puentes wrote: >>> >>> From: "Juan A. Suarez Romero" <jasua...@igalia.com> >>> >>> Even when the number of vertex attributes is under the limit, for >>> shaders that use a high number of them, we can quickly exhaust the >>> number of hardware registers. >> Were you able to construct a case where this actually occurs? Limits >> exposed by the driver and enforced by the GLSL linker should prevent >> this. >> > > Yes. See the attached shader1 test that exposes this problem. > > > The driver supports up to 16 vertex attributes. ARB_vertex_attrib_64bit > states that attribute variables of type dvec3, dvec4, dmat2x3, dmat2x4, > dmat3, dmat3x4, dmat4x3, and dmat4 *may* count as consuming twice as > many attributes as equivalent single-precision types. > > > I highlight the may, because it is not mandatory. If we count those > types as consuming the same as a single-precision type (which is what > is happening in Mesa), we are consuming 15 attributes, so we are under > the limit.
This is the thing we need to fix. Bailing from deep inside the driver code generation (which may happen long, long after linking) is not allowed. If a shader is not going to work, we are required to generate the error in glLinkProgram. > The issue is that in scalar mode (SIMD8), for each vec4 attribute we > require 4 registers (or 8 per each dvec4 attribute), so it is easy to > reach a huge number of registers. Which is the problem the test is > exposing. > > > If we were working on SIMD4x2, this wouldn't happen, as we would > require only 1 register per vec4 attribute (or 2 per each dvec4). > > > So the problem is a combination of using a high number of attributes > and SIMD8 mode. > > > One of the first approaches we took was precisely to consider the > previous types to consume two attributes, instead of one. In this case, > the shader1 test would be consuming 29 attributes, so the limit would > be reached. > > > But I see couple of drawbacks with this approach: > > > - There are tests that under the same conditions (less than the limit > if you count those types as occupying the same as single-precision, but > beyond the limit if those types are considered as consuming twice) they > still works. An example is the attached shader2 test: it requires 13 > attributes (or 19 counting as twice the mentioned types) and it works > fine. I don't see where you get 19. I get 3 array elements * 2 matrix columns * 2 for value0, 2 array elements * 3 matrix columns * 2 for value1, and 1 for piglit_vertex. That's 25. This overcounts because by naive doubling the dmat2 counts each column as 2 slots, but we only actually need 1. By doubling only when it's necessary, that shader would need (3 * 2) + (2 * 3 * 2) + 1 = 19. > - This check affects to all the backends. And there could be some > backend that works perfectly fine with the current implementation, > which is less conservative. In fact, we have an example: the same > driver running in vec4 mode (SIMD4x2) works perfectly fine. I think we can handle this by having a per-type (double, dvec2, dvec3, and dvec4) flag to select the double or don't-double behavior. > So all in all, the best way we found is to keep how we count vertex > attributes, and just abort if we exhaust the available registers. > > Ideally, the best approach would be to switch to vec4 mode. But this > would require to support gen8+vec4 (we are right now working on support > for gen7, which uses vec4), and also to improve switching from scalar > mode to vec4 when compiling the shader. > > > J.A. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev