Hi everyone, For those interested I will be writing a blog at: https://hellthom.wordpress.com/ where I will be documenting my progress in GSoC (at least bi-weekly). It also has some info about me in the "about" section. If there is anything else you want to know, or think I should add, just holla.
Now for the actual reason for the mail. I'm seeking opinions and thoughts on the loop unrolling part of my GSoC project. First, to get started, are some examples of loops from my shader-db: for (i = -fsize.x; i <= fsize.x; i++) {{ color += texture2D(tex, tc + i * texscale.xy, 0.0); }} // EndBox1/2. for (i.x = -fsize.x; i.x <= fsize.x; i.x++) { for (i.y = -fsize.y; i.y <= fsize.y; i.y++) { color += texture2D(tex, tc + i * texscale.xy, 0.0); }} // EndBox1/2. for (int rep1 = 0; rep1 < ps_i0.x; rep1++) { t1_ps.zw = (t0_ps.yy * t1_ps.xy) + t0_ps.xz; t3_ps = texture2D(g_samScreenImage, t1_ps.zw); t2_ps.xyz = t2_ps.xyz + t3_ps.xyz; t0_ps.y = t0_ps.y + ps_c0.x; } for (int i = 0; i < LIGHT_COUNT; i++){ vec3 lightVec = invRadius[i] * (lightPos[i] - gl_Vertex.xyz); #ifdef SHADOWS shadowVec[i] = -lightVec; #endif lVec[i].x = dot(lightVec, tangent); lVec[i].y = dot(lightVec, binormal); lVec[i].z = dot(lightVec, normal); } for(i = 0.0, f = 1.0; i < ScaleSteps.w; ++i, f *= 0.5) RT += OffsetVector * (step(dp_textureGrad(Texture_Normal, RT.xy, dPdx, dPdy).a, RT.z) * f - 0.5 * f); for (int l_3 = 0; l_3 < 28; l_3++) { vec4 tmpvar_10; tmpvar_10.xy = vec2(1.2, 1.2); tmpvar_10.zw = DiscKernel[l_3].zz; vec4 tmpvar_11; tmpvar_11 = (tmpvar_1.xyxy + ((DiscKernel[l_3].xyxy * poissonScale_5.xyxy) / tmpvar_10)); vec4 tmpvar_12; tmpvar_12 = texture2D (_MainTex, tmpvar_11.xy); vec4 tmpvar_13; tmpvar_13 = texture2D (_MainTex, tmpvar_11.zw); if (((tmpvar_12.w + tmpvar_13.w) > 0.0)) { vec2 tmpvar_14; tmpvar_14.y = 1.0; tmpvar_14.x = (DiscKernel[l_3].z / 1.2); vec2 tmpvar_15; tmpvar_15.x = tmpvar_12.w; tmpvar_15.y = tmpvar_13.w; vec2 tmpvar_16; vec2 tmpvar_17; tmpvar_17 = clamp ((( (tmpvar_15 - (centerTap_7.ww * tmpvar_14)) - vec2(-0.265, -0.265)) / vec2(0.265, 0.265)), 0.0, 1.0); tmpvar_16 = (tmpvar_17 * (tmpvar_17 * (3.0 - (2.0 * tmpvar_17) ))); sum_6 = (sum_6 + ((tmpvar_12 * tmpvar_16.x) + (tmpvar_13 * tmpvar_16.y))); sampleCount_4 = (sampleCount_4 + dot (tmpvar_16, vec2(1.0, 1.0))); }; }; Most of the loops are simple; array defines or lookups with a for loop. No magic at all. Usually only one or two assignments in the loop body. Then there are the nested loops from dota 2, and there are also some cases of loops that have large bodies. I have however not managed to grep up any loops apart from for-loops. Seems for-loops is the goto loops, at least for my collection of shaders. Some of the ideas and plans I've made so far: Split the work into two parts: First I will write a loop analysis pass. I think it makes sense to separate this out into its own file, as the information it can gather can be usefull. Things that come to my mind are loop invariant code motion and strength reduction passes. Then i will write the actual unroll pass when the analysis pass is done. Add some new data to the metadata system. At the very least an is_invariant variable. This should track if the ssa-value is loop invariant. To avoid the issue with nested loops a ssa-variable marked as invariant is only invariant in the loop it is present. They are invariant if the operands are all in the preceding block, if the operands are invariant in the same loop as we are in, or a combination of the two. There should probably also be an is_induction_variable flag in the metadata, as this will be shared between loop unrolling and a possible strength reduction pass (and probably others too). I need to think a bit more about this though. There should probably be some kind of relation between basic induction variables (int i) and derived variables. On the subject of loop analysis, does anyone happen to know of a good paper on induction variable detection in ssa? Loop invariants are simple enough, but induction variables are a bit harder. Luckily there seems to be only linear induction variables as far as I can detect. The (int i = 0; i < max; i++) pattern is recurring a lot, with the random case of variables that all fit in the form i + i*a, where a is loop invariant. So I don't think we need a to complicated algorithm. For the case with nested loops I was thinking we might want to go for unroll-and-jam. I think I might have managed to somehow loose an important paper I had about nested loops, but there where some other possibilities for handling nested loops that might be worth taking a closer look at. I recall there being a better performing solution than unroll-and-squash. But this is a bit up and into the future still. One more thing; Is there a limit where the loop body gets so large that we want to decide that "gah, this sucks, no point in unrolling this"? I imagine as the loops get large there will be a case of diminishing returns from the unrolling? I will start sketching a solution now, so if there is any objections/suggestions on my planned approach it would be nice to get those out there before I get to much into it :) Regards, Thomas _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev