This extremely slow compilation is not actually an infinite loop. But the compile time does increase with every unrolled loop step in the shader. The time to complete is 2^N, where N is the number of loop iterations.
The call to (*rvalue)->accept(this); in ir_constant_folding_visitor::handle_rvalue is key to this. Dropping that call for the case when rvalue is not a constant makes compilation finish very quickly. And for at least this shader it produces exactly the same results. Constant folding is done very effectively for the y and z channels. But the x channel still produces a series of adds of constants instead of one add with the sum. That is a separate issue that could still be investigated. On Thu, Sep 11, 2014 at 1:53 PM, Mike Stroyan <m...@lunarg.com> wrote: > I have looked at this problem quite a bit but never got to the bottom of > it. > This problem really started to show with commit 857f3a6 - "glsl: Ignore > loop-too-large heuristic if there's bad variable indexing." > That commit makes many more loops unroll. > Here is another example piglit shader_runner test that shows the problem. > Changing the value of LOOP_COUNT and running this with "time shader_runner > -auto" > shows that the compile time doubles each time the loop count is > incremented by one. > Large values may seem to take forever. But they do eventually finish. > Loop counts over 32 will still prevent unrolling and avoid the slow > compile. > > A key part of the problem is the assignment to "col.rgb" in your shader or > "tmpvar_3.xyz" in this shader. > The operation on only some channels results in splitting the vec4 into one > temporary per channel. > This comment from src/mesa/drivers/dri/i965/brw_fs_vector_splitting.cpp is > telling. > 27│ * If a vector is only ever referenced by its components, then > 28│ * split those components out to individual variables so they can be > 29│ * handled normally by other optimization passes. > > brw_do_vector_splitting creates the flattening_tmp_y and flattening_tmp_z > temporaries. > Operations on one of the channels are optimized quickly. > But the other two channels are handled badly. > The operations on the first channel prevent the same simplification of the > expressions for the other two channels. > > Changing ir_vector_splitting_visitor::visit_leave to use "writemask = 1 << > i;" instead of "writemask = 1;" > in the "if (lhs)" case makes the y and z channels get handled like the x > channel. > That results in something like > (assign (y) (var_ref flattening_tmp_y) (expression float * (swiz y > (var_ref texture2D_retval) )(var_ref channel_expressions@8114) ) ) > It is very fast to compile, but produces bad code that hangs the GPU. > It is putting the y channel float value into a non-existent "y" channel of > a simple float temporary, then later reading the real x channel. > > [require] > GLSL >= 1.10 > > [vertex shader] > #version 120 > attribute vec2 Tex0; > attribute vec3 Position; > void main () > { > vec4 inPos_1; > inPos_1.xy = Position.xy; > inPos_1.z = 1.00000; > inPos_1.w = 1.00000; > gl_Position = inPos_1; > vec4 tmpvar_2; > tmpvar_2.zw = vec2(0.00000, 0.00000); > tmpvar_2.xy = Tex0; > gl_TexCoord[0] = tmpvar_2; > } > > [fragment shader] > #version 120 > #define LOOP_COUNT 25 > uniform sampler2D u_sampler; > void main () > { > vec2 tmpvar_1; > tmpvar_1 = gl_TexCoord[0].xy; > vec4 tmpvar_3; > tmpvar_3 = vec4(0.00000, 0.00000, 0.00000, 1.00000); > float weighting_5[LOOP_COUNT]; > for (int i = 0; i < LOOP_COUNT; i++) { > float tmpvar_10; > tmpvar_10 = ((float(int(abs ((float(i) - 15.0))))) / 15.0000); > float tmpvar_11; > tmpvar_11 = exp ((-(tmpvar_10) * tmpvar_10)); > weighting_5[i] = tmpvar_11; > }; > for (int k = 0; k < LOOP_COUNT; k++) { > tmpvar_3.xyz += (texture2D (u_sampler, tmpvar_1).xyz * weighting_5[k]); > }; > gl_FragData[0] = tmpvar_3; > } > > [test] > draw rect -1 -1 2 2 > probe rgb 1 1 0.0 0.0 0.0 > > > On Thu, Sep 11, 2014 at 2:02 AM, Iago Toral Quiroga <ito...@igalia.com> > wrote: > >> Hi, >> >> I have been looking into this bug: >> >> Compiling of shader gets stuck in infinite loop >> https://bugs.freedesktop.org/show_bug.cgi?id=78468 >> >> Although this occurs at link time when the Intel driver has run some of >> its specific lowering passes, it looks like the problem could hit other >> drivers if the right conditions are met, as the actual problem happens >> inside common optimization passes. >> >> I reproduced the problem with a very simple shader like this: >> >> uniform sampler2D tex; >> out vec4 FragColor; >> void main() >> { >> vec4 col = texture(tex, vec2(0, 0)); >> for (int i=0; i<30; i++) >> col += vec4(0.1, 0.1, 0.1, 0.1); >> col = vec4(col.rgb / 2.0, col.a); >> FragColor = col; >> } >> >> and for this shader, I traced the problem down to the fact that >> do_tree_grafting() is generating instructions like this: >> >> (assign (x) (var_ref flattening_tmp_y@116) (expression float * (swiz x >> (expression float + (swiz x (expression float + (swiz x (expression >> float + (swiz x (expression float + (swiz x (expression float + (swiz x >> (expression float + (swiz x (expression float + (swiz x (expression >> float + (swiz x (expression float + (swiz x (expression float + (swiz x >> (expression float + (swiz x (expression float + (swiz x (expression >> float + (swiz x (expression float + (swiz x (expression float + (swiz x >> (expression float + (swiz x (expression float + (swiz x (expression >> float + (swiz x (expression float + (swiz x (expression float + (swiz x >> (expression float + (swiz x (expression float + (swiz x (expression >> float + (swiz x (expression float + (swiz x (expression float + (swiz x >> (expression float + (swiz x (expression float + (swiz x (expression >> float + (swiz x (expression float + (var_ref col_y) (constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.100000)) ) )(constant float >> (0.100000)) ) )(constant float (0.500000)) ) ) >> >> And when we feed these to do_constant_folding() it takes forever to >> finish. For this shader in particular, removing the tree grafting pass >> from do_common_optimization eliminates the problem. >> >> Notice that small, seemingly irrelevant changes to the shader code, can >> make it so that this never happens. For example, if we initialize 'col' >> to something like vec4(0,0,0,0) instead of using the texture function, >> or we remove the division by 2.0 in the last assignment to 'col', these >> instructions are never produced and the shader compiles okay. >> >> The number of iterations in the loop is also important, if we have too >> many we do not unroll the loop and the problem never happens, if we have >> too few, rather than generating a super large tree of expressions like >> above, we generate something like this and the problem, again, does not >> happen: (notice how it adds 0.1 nine times to make 0.9 rather than >> chaining 9 add expressions for 10 iterations of the loop): >> >> (assign (x) (var_ref flattening_tmp_y) (expression float * (expression >> float + (constant float (0.900000)) (var_ref col_y) ) (constant float >> (0.500000)) ) ) >> >> So it seems that whether we generate a huge chunk of expressions or not >> is subject to a number of factors, but when the right conditions are met >> we can generate code that can stall compilation forever. >> >> Reading what tree grafting is supposed to do, this does not seem to be >> an unexpected result though, so I wonder what would be the right way to >> fix this. It would look like we would want to do whatever we are doing >> when we only have a few iterations in the loop, but I don't know why we >> generate different code in that case and I am not familiar enough with >> all the optimization and lowering passes to assess what would make sense >> to do here... so, any suggestions? >> >> Iago >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > > > > -- > > Mike Stroyan - Software Architect > LunarG, Inc. - The Graphics Experts > Cell: (970) 219-7905 > Email: m...@lunarg.com > Website: http://www.lunarg.com > -- Mike Stroyan - Software Architect LunarG, Inc. - The Graphics Experts Cell: (970) 219-7905 Email: m...@lunarg.com Website: http://www.lunarg.com
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev