Hi Roland, Jose I wanted to bring a problem I found to your attention, and discuss about it and ways to solve it.
I'm working on regressions of piglit gpu.py between x86-64 and ppc64le, when running with llvmpipe. One of the regressions manifests itself in 2 tests, clip-distance-bulk-copy and clip-distance-itemized-copy What happens in ppc64le is that *some* of the interpolated per-vertex values (clip-distance values) that constitute the input into the fragment shader, are not calculated according to the required accuracy of the test (which is an error/distance of less than 1e-6) It took a while, but I believe I found the reason. In general, when running on ppc64le, the code path that is taken at lp_build_interp_soa_init() is doing bld->simple_interp = FALSE, which means using coeffs_init() + attribs_update(). That's in contrast with x86-64, where bld->simple_interp = TRUE, which means the code will use coeffs_init_simple() + attribs_update_simple() In specific, the problem lies in how the coeffs are calculated inside coeffs_init. To simply put it, they are only *actually* calculated correctly for the first quad. The coeffs are later *updated* for the other quads, using dadx/dady and dadq. ------ side-note: I believe you even documented it in coeffs_init(): /* * a = a0 + (x * dadx + y * dady) * a0aos is the attrib value at top left corner of stamp */ ------ The root cause for that, I believe, is because coeffs_init does *not* take into account the different x/y offsets for the other quads. In contrast, on x86-64, the code uses a function called calc_offset(), which does take the different x/y offsets into account. Please look at this definition (from lp_bld_interp.c) : static const unsigned char quad_offset_x[16] = {0, 1, 0, 1, 2, 3, 2, 3, 0, 1, 0, 1, 2, 3, 2, 3}; static const unsigned char quad_offset_y[16] = {0, 0, 1, 1, 0, 0, 1, 1, 2, 2, 3, 3, 2, 2, 3, 3}; Now, look at how coeffs_init() uses them: for (i = 0; i < coeff_bld->type.length; i++) { LLVMValueRef nr = lp_build_const_int32(gallivm, i); LLVMValueRef pixxf = lp_build_const_float(gallivm, quad_offset_x[i]); LLVMValueRef pixyf = lp_build_const_float(gallivm, quad_offset_y[i]); pixoffx = LLVMBuildInsertElement(builder, pixoffx, pixxf, nr, ""); pixoffy = LLVMBuildInsertElement(builder, pixoffy, pixyf, nr, ""); } So, in ppc64le, where coeff_bld->type.length == 4, we *always* take the first four values from the above arrays. The code *never* calculates the coeffs based on the other offsets. Admittedly, that's *almost* always works and I doubt you will see a difference in display. However, in the tests I mentioned above, it creates, for some pixels, a bigger error than what is allowed. The first thing that came into my mind was to change, in lp_build_interp_soa_init(): "if (coeff_type.length > 4)" to this: "if (coeff_type.length >= 4)" When I did that, I saw it fixes the above tests and didn't cause any regressions on ppc64le and/or on x86-64. However, I wanted to ask you guys if you remember why you allowed only vectors with size above 4 to use the "simple" route ? Was there any performance hit ? I think that to fix coeffs_init()+attribs_update() is to basically re-design them. Otherwise, I would have tried to fix them first. In any case, I would love to hear your feedback and to discuss this further. Thanks, Oded _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev