Am 14.02.2014 19:59, schrieb Jeff Muizelaar: > > On Feb 14, 2014, at 1:00 PM, Roland Scheidegger <srol...@vmware.com > <mailto:srol...@vmware.com>> wrote: > >> Am 14.02.2014 18:07, schrieb Jeff Muizelaar: >> >> I'll need to take another look and run some tests, though I've got some >> quick comments: >> >> >> @@ -1031,16 +1082,28 @@ lp_build_sample_image_linear(struct >> lp_build_sample_context *bld, >> s = lp_build_mul_imm(&bld->coord_bld, s, 256); >> if (dims >= 2) >> t = lp_build_mul_imm(&bld->coord_bld, t, 256); >> if (dims >= 3) >> r = lp_build_mul_imm(&bld->coord_bld, r, 256); >> } >> >> /* convert float to int */ >> + half = lp_build_const_vec(bld->gallivm, bld->coord_bld.type, 0.5); >> + s = lp_build_add(&bld->coord_bld, s, half); >> + s = LLVMBuildFPToSI(builder, s, i32_vec_type, ""); >> + if (dims >= 2) { >> + t = lp_build_add(&bld->coord_bld, t, half); >> + t = LLVMBuildFPToSI(builder, t, i32_vec_type, ""); >> + } >> + if (dims >= 3) { >> + r = lp_build_add(&bld->coord_bld, r, half); >> + r = LLVMBuildFPToSI(builder, r, i32_vec_type, ""); >> + } >> + >> s = LLVMBuildFPToSI(builder, s, i32_vec_type, ""); >> if (dims >= 2) >> t = LLVMBuildFPToSI(builder, t, i32_vec_type, ""); >> if (dims >= 3) >> r = LLVMBuildFPToSI(builder, r, i32_vec_type, ""); >> This looks quite incorrect you're converting the s/t/r coords twice from >> float to int. > > Yep. I forgot to remove this second hunk. > >> /* subtract 0.5 (add -128) */ >> i32_c128 = lp_build_const_int_vec(bld->gallivm, i32.type, -128); >> >> >> Also, the add looks iffy as it won't work correctly if the coords are >> negative, since the FPToSI is of course trunc, not floor. > > I think it will be ok because the REPEAT case avoids negative coord > before converting to int and the other cases clamp to 0. I think for clamp_to_edge you're right, but repeat will use these coords for the POT case (npot indeed won't care at all since it doesn't actually use these values at all).
> >> Maybe instead of using add + fptosi should just use lp_build_iround >> (which is just one sse instruction too on x86 though if you're targeting >> another arch it will definitely be more code at least unless someone >> adds an intrinsic for it if the cpu even has one). Might not matter >> though depending on address modeā¦ > > Yeah, that might be a better idea. > >> >> And I might be missing something why do you think the new repeat code is >> faster? Though that might also depend on arch_rounding being available >> and such but at first looks it seems slightly more complex to me. > > The current code converts integer and fractional parts to integer > separately. It also does the subtract 0.5 in floating point instead of > integer arithmetic (-128). Yeah, you're probably right. With simple instruction counting I end up with the same number of instructions actually (assuming arch_rounding is available, I miscounted previously and thought it were two more), but the ones in your version ought to be a bit cheaper. I suspect actually nearest filtering suffers from the same inaccuracy, we actually also do the mul by 256 so we get 8 fractional bits, then toss away those 8 bits and just use the non-fractional part. That looks to me like we only need these 8 bits so we get "reasonably correct" rounding, but we still would chose the wrong sampling point sometimes (if it's less than 1/256 pixels away from the center between two texels). I actually noticed that one a while ago but I can't remember why I didn't do something about it... Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev