Previously, i965 geometry shaders always operated in DUAL_OBJECT mode, which is similar to vertex shader operation in that two independent sets of inputs get dispatched to a single SIMD4x2 geometry shader thread, which executes them both in parallel.
When register usage is tight, we need to switch to a mechanism that uses fewer registers. In an ideal world we'd fall back to SINGLE mode, in which a single set of inputs is dispatched to a SIMD4x1 geometry shader thread. Effectively this makes twice as many registers available, since it allows independent data to be interleaved into the lower and upper halves of each register. Unfortunately, we don't yet have the infrastructure in the vec4 back-end to support interleaving all the registers. So we do the next best thing, which is to use DUAL_INSTANCED dispatch mode. In this mode, a single set of geometry shader inputs is delivered to the shader in interleaved fashion (as would happen in SINGLE mode), but the shader operates as a SIMD4x2 shader (so all other registers are non-interleaved). If the geometry shader is instanced, then up to two instances may be dispatched to the geometry shader at once; otherwise, each geometry shader invocation runs in its own thread, with the execution mask set appropriately. Since we don't support instanced geometry shaders yet, DUAL_INSTANCED and SINGLE modes are for all intents and purposes equivalent, except that we don't have to do as much back-end register interleaving work. The compilation strategy for choosing between DUAL_INSTANCED and DUAL_OBJECT modes is similar to what we do for 8-wide vs. 16-wide fragment shaders. First we try compiling the shader in DUAL_OBJECT mode with register spilling disabled. If that fails, we fall back to DUAL_INSTANCED mode and compile with register spilling enabled. Unfortunately, even when using DUAL_INSTANCED mode we still can't support 128 geometry shader input components, due to other limitations in our vec4 back-end code. So the final patch of the series reduces gl_MaxGeometryInputComponents to 64, the minimum required by the spec. This series needs to be applied atop "vbo: Make vbo_sw_primitive_restart optionally count primitives." and "i965/gs: Fix gl_PrimitiveIDIn when using SW primitive restart.", which are on the mailing list but haven't been reviewed yet. To see the series in context, please check out branch "gs-phase-6" from https://github.com/stereotype441/mesa.git. [PATCH 1/8] i965/vec4: Add the ability for attributes to be interleaved. [PATCH 2/8] i965/vec4: if register allocation fails, don't try to schedule. [PATCH 3/8] i965/vec4: Add the ability to suppress register spilling. [PATCH 4/8] i965/gs: Add the ability to compile a DUAL_INSTANCED geometry shader. [PATCH 5/8] i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs. [PATCH 6/8] i965/gs: fix up primitive ID workaround for DUAL_INSTANCE dispatch. [PATCH 7/8] i965/gs: If a DUAL_OBJECT gs would spill, fall back to DUAL_INSTANCED. [PATCH 8/8] i965: Reduce gl_MaxGeometryInputComponents to 64. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev