This extremely slow compilation is not actually an infinite loop.
But the compile time does increase with every unrolled loop step in the
shader.
The time to complete is 2^N, where N is the number of loop iterations.

The call to
 (*rvalue)->accept(this);
in ir_constant_folding_visitor::handle_rvalue is key to this.
Dropping that call for the case when rvalue is not a constant makes
compilation
finish very quickly.  And for at least this shader it produces exactly the
same results.  Constant folding is done very effectively for the y and z
channels.

But the x channel still produces a series of adds of constants instead of
one add with the sum.
That is a separate issue that could still be investigated.

On Thu, Sep 11, 2014 at 1:53 PM, Mike Stroyan <m...@lunarg.com> wrote:

> I have looked at this problem quite a bit but never got to the bottom of
> it.
> This problem really started to show with commit 857f3a6 - "glsl: Ignore
> loop-too-large heuristic if there's bad variable indexing."
> That commit makes many more loops unroll.
> Here is another example piglit shader_runner test that shows the problem.
> Changing the value of LOOP_COUNT and running this with "time shader_runner
> -auto"
> shows that the compile time doubles each time the loop count is
> incremented by one.
> Large values may seem to take forever.  But they do eventually finish.
> Loop counts over 32 will still prevent unrolling and avoid the slow
> compile.
>
> A key part of the problem is the assignment to "col.rgb" in your shader or
> "tmpvar_3.xyz" in this shader.
> The operation on only some channels results in splitting the vec4 into one
> temporary per channel.
> This comment from src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp is
> telling.
>  27│  * If a vector is only ever referenced by its components, then
>  28│  * split those components out to individual variables so they can be
>  29│  * handled normally by other optimization passes.
>
> brw_do_vector_splitting creates the flattening_tmp_y and flattening_tmp_z
> temporaries.
> Operations on one of the channels are optimized quickly.
> But the other two channels are handled badly.
> The operations on the first channel prevent the same simplification of the
> expressions for the other two channels.
>
> Changing ir_vector_splitting_visitor::visit_leave to use "writemask = 1 <<
> i;" instead of "writemask = 1;"
> in the "if (lhs)" case makes the y and z channels get handled like the x
> channel.
> That results in something like
>       (assign  (y) (var_ref flattening_tmp_y)  (expression float * (swiz y
> (var_ref texture2D_retval) )(var_ref channel_expressions@8114) ) )
> It is very fast to compile, but produces bad code that hangs the GPU.
> It is putting the y channel float value into a non-existent "y" channel of
> a simple float temporary, then later reading the real x channel.
>
> [require]
> GLSL >= 1.10
>
> [vertex shader]
> #version 120
> attribute vec2 Tex0;
> attribute vec3 Position;
> void main ()
> {
>   vec4 inPos_1;
>   inPos_1.xy = Position.xy;
>   inPos_1.z = 1.00000;
>   inPos_1.w = 1.00000;
>   gl_Position = inPos_1;
>   vec4 tmpvar_2;
>   tmpvar_2.zw = vec2(0.00000, 0.00000);
>   tmpvar_2.xy = Tex0;
>   gl_TexCoord[0] = tmpvar_2;
> }
>
> [fragment shader]
> #version 120
> #define LOOP_COUNT 25
> uniform sampler2D u_sampler;
> void main ()
> {
>   vec2 tmpvar_1;
>   tmpvar_1 = gl_TexCoord[0].xy;
>   vec4 tmpvar_3;
>   tmpvar_3 = vec4(0.00000, 0.00000, 0.00000, 1.00000);
>   float weighting_5[LOOP_COUNT];
>   for (int i = 0; i < LOOP_COUNT; i++) {
>     float tmpvar_10;
>     tmpvar_10 = ((float(int(abs ((float(i) - 15.0))))) / 15.0000);
>     float tmpvar_11;
>     tmpvar_11 = exp ((-(tmpvar_10) * tmpvar_10));
>     weighting_5[i] = tmpvar_11;
>   };
>   for (int k = 0; k < LOOP_COUNT; k++) {
>     tmpvar_3.xyz += (texture2D (u_sampler, tmpvar_1).xyz * weighting_5[k]);
>   };
>   gl_FragData[0] = tmpvar_3;
> }
>
> [test]
> draw rect -1 -1 2 2
> probe rgb 1 1 0.0 0.0 0.0
>
>
> On Thu, Sep 11, 2014 at 2:02 AM, Iago Toral Quiroga <ito...@igalia.com>
> wrote:
>
>> Hi,
>>
>> I have been looking into this bug:
>>
>> Compiling of shader gets stuck in infinite loop
>> https://bugs.freedesktop.org/show_bug.cgi?id=78468
>>
>> Although this occurs at link time when the Intel driver has run some of
>> its specific lowering passes, it looks like the problem could hit other
>> drivers if the right conditions are met, as the actual problem happens
>> inside common optimization passes.
>>
>> I reproduced the problem with a very simple shader like this:
>>
>> uniform sampler2D tex;
>> out vec4 FragColor;
>> void main()
>> {
>>    vec4 col = texture(tex, vec2(0, 0));
>>    for (int i=0; i<30; i++)
>>       col += vec4(0.1, 0.1, 0.1, 0.1);
>>    col = vec4(col.rgb / 2.0, col.a);
>>    FragColor = col;
>> }
>>
>> and for this shader, I traced the problem down to the fact that
>> do_tree_grafting() is generating instructions like this:
>>
>> (assign  (x) (var_ref flattening_tmp_y@116)  (expression float * (swiz x
>> (expression float + (swiz x (expression float + (swiz x (expression
>> float + (swiz x (expression float + (swiz x (expression float + (swiz x
>> (expression float + (swiz x (expression float + (swiz x (expression
>> float + (swiz x (expression float + (swiz x (expression float + (swiz x
>> (expression float + (swiz x (expression float + (swiz x (expression
>> float + (swiz x (expression float + (swiz x (expression float + (swiz x
>> (expression float + (swiz x (expression float + (swiz x (expression
>> float + (swiz x (expression float + (swiz x (expression float + (swiz x
>> (expression float + (swiz x (expression float + (swiz x (expression
>> float + (swiz x (expression float + (swiz x (expression float + (swiz x
>> (expression float + (swiz x (expression float + (swiz x (expression
>> float + (swiz x (expression float + (var_ref col_y) (constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.100000)) ) )(constant float
>> (0.100000)) ) )(constant float (0.500000)) ) )
>>
>> And when we feed these to do_constant_folding() it takes forever to
>> finish. For this shader in particular, removing the tree grafting pass
>> from do_common_optimization eliminates the problem.
>>
>> Notice that small, seemingly irrelevant changes to the shader code, can
>> make it so that this never happens. For example, if we initialize 'col'
>> to something like vec4(0,0,0,0) instead of using the texture function,
>> or we remove the division by 2.0 in the last assignment to 'col', these
>> instructions are never produced and the shader compiles okay.
>>
>> The number of iterations in the loop is also important, if we have too
>> many we do not unroll the loop and the problem never happens, if we have
>> too few, rather than generating a super large tree of expressions like
>> above, we generate something like this and the problem, again, does not
>> happen: (notice how it adds 0.1 nine times to make 0.9 rather than
>> chaining 9 add expressions for 10 iterations of the loop):
>>
>> (assign  (x) (var_ref flattening_tmp_y)  (expression float * (expression
>> float + (constant float (0.900000)) (var_ref col_y) ) (constant float
>> (0.500000)) ) )
>>
>> So it seems that whether we generate a huge chunk of expressions or not
>> is subject to a number of factors, but when the right conditions are met
>> we can generate code that can stall compilation forever.
>>
>> Reading what tree grafting is supposed to do, this does not seem to be
>> an unexpected result though, so I wonder what would be the right way to
>> fix this. It would look like we would want to do whatever we are doing
>> when we only have a few iterations in the loop, but I don't know why we
>> generate different code in that case and I am not familiar enough with
>> all the optimization and lowering passes to assess what would make sense
>> to do here... so, any suggestions?
>>
>> Iago
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
>
> --
>
>  Mike Stroyan - Software Architect
>  LunarG, Inc.  - The Graphics Experts
>  Cell:  (970) 219-7905
>  Email: m...@lunarg.com
>  Website: http://www.lunarg.com
>



-- 

 Mike Stroyan - Software Architect
 LunarG, Inc.  - The Graphics Experts
 Cell:  (970) 219-7905
 Email: m...@lunarg.com
 Website: http://www.lunarg.com
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to