On Sat, Nov 14, 2015 at 10:27 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Sat, Nov 14, 2015 at 4:03 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: >> On Sat, Nov 14, 2015 at 3:28 PM, Hendrik Leppkes <h.lepp...@gmail.com> wrote: >>> On Sat, Nov 14, 2015 at 3:51 AM, Ganesh Ajjanagadde <gajja...@mit.edu> >>> wrote: >>>> On Fri, Nov 13, 2015 at 7:17 PM, Ronald S. Bultje <rsbul...@gmail.com> >>>> wrote: >>>>> Hi, >>>>> >>>>> On Fri, Nov 13, 2015 at 6:16 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >>>>> wrote: >>>>> >>>>>> On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbul...@gmail.com> >>>>>> wrote: >>>>>> > Hi, >>>>>> > >>>>>> > On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >>>>>> > wrote: >>>>>> > >>>>>> >> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje <rsbul...@gmail.com> >>>>>> >> wrote: >>>>>> >> > Hi, >>>>>> >> > >>>>>> >> > On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde < >>>>>> gajja...@mit.edu> >>>>>> >> > wrote: >>>>>> >> > >>>>>> >> >> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje < >>>>>> rsbul...@gmail.com> >>>>>> >> >> wrote: >>>>>> >> >> > Hi Ganesh, >>>>>> >> >> > On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" < >>>>>> >> gajjanaga...@gmail.com> >>>>>> >> >> > wrote: >>>>>> >> >> >> >>>>>> >> >> >> The rationale for this function is reflected in the >>>>>> >> >> >> documentation >>>>>> for >>>>>> >> >> >> it, and is copied here: >>>>>> >> >> >> >>>>>> >> >> >> Clip a double value into the long long amin-amax range. >>>>>> >> >> >> This function is needed because conversion of floating point to >>>>>> >> integers >>>>>> >> >> > when >>>>>> >> >> >> it does not fit in the integer's representation does not >>>>>> necessarily >>>>>> >> >> > saturate >>>>>> >> >> >> correctly (usually converted to a cvttsd2si on x86) which >>>>>> saturates >>>>>> >> >> > numbers >>>>>> >> >> >> > INT64_MAX to INT64_MIN. The standard marks such conversions as >>>>>> >> >> undefined >>>>>> >> >> >> behavior, allowing this sort of mathematically bogus >>>>>> >> >> >> conversions. >>>>>> >> This >>>>>> >> >> > provides >>>>>> >> >> >> a safe alternative that is slower obviously but assures safety >>>>>> >> >> >> and >>>>>> >> >> better >>>>>> >> >> >> mathematical behavior. >>>>>> >> >> >> API: >>>>>> >> >> >> @param a value to clip >>>>>> >> >> >> @param amin minimum value of the clip range >>>>>> >> >> >> @param amax maximum value of the clip range >>>>>> >> >> >> @return clipped value >>>>>> >> >> >> >>>>>> >> >> >> Note that a priori if one can guarantee from the calling side >>>>>> >> >> >> that >>>>>> >> the >>>>>> >> >> >> double is in range, it is safe to simply do an explicit/implicit >>>>>> >> cast, >>>>>> >> >> >> and that will be far faster. However, otherwise this function >>>>>> should >>>>>> >> be >>>>>> >> >> >> used. >>>>>> >> >> >> >>>>>> >> >> >> avutil minor version is bumped. >>>>>> >> >> >> >>>>>> >> >> >> Reviewed-by: Ronald S. Bultje <rsbul...@gmail.com> >>>>>> >> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >>>>>> >> >> >> --- >>>>>> >> >> >> libavutil/common.h | 30 ++++++++++++++++++++++++++++++ >>>>>> >> >> >> libavutil/version.h | 2 +- >>>>>> >> >> >> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>> >> >> >> >>>>>> >> >> >> diff --git a/libavutil/common.h b/libavutil/common.h >>>>>> >> >> >> index 6f0f582..f4687ab 100644 >>>>>> >> >> >> --- a/libavutil/common.h >>>>>> >> >> >> +++ b/libavutil/common.h >>>>>> >> >> >> @@ -298,6 +298,33 @@ static av_always_inline av_const double >>>>>> >> >> > av_clipd_c(double a, double amin, double >>>>>> >> >> >> else return a; >>>>>> >> >> >> } >>>>>> >> >> >> >>>>>> >> >> >> +/** >>>>>> >> >> >> + * Clip and convert a double value into the long long amin-amax >>>>>> >> range. >>>>>> >> >> >> + * This function is needed because conversion of floating point >>>>>> to >>>>>> >> >> > integers when >>>>>> >> >> >> + * it does not fit in the integer's representation does not >>>>>> >> necessarily >>>>>> >> >> > saturate >>>>>> >> >> >> + * correctly (usually converted to a cvttsd2si on x86) which >>>>>> >> saturates >>>>>> >> >> > numbers >>>>>> >> >> >> + * > INT64_MAX to INT64_MIN. The standard marks such >>>>>> >> >> >> conversions >>>>>> as >>>>>> >> >> > undefined >>>>>> >> >> >> + * behavior, allowing this sort of mathematically bogus >>>>>> conversions. >>>>>> >> >> > This provides >>>>>> >> >> >> + * a safe alternative that is slower obviously but assures >>>>>> >> >> >> safety >>>>>> >> and >>>>>> >> >> > better >>>>>> >> >> >> + * mathematical behavior. >>>>>> >> >> >> + * @param a value to clip >>>>>> >> >> >> + * @param amin minimum value of the clip range >>>>>> >> >> >> + * @param amax maximum value of the clip range >>>>>> >> >> >> + * @return clipped value >>>>>> >> >> >> + */ >>>>>> >> >> >> +static av_always_inline av_const int64_t >>>>>> >> >> >> av_rint64_clip_c(double >>>>>> a, >>>>>> >> >> > int64_t amin, int64_t amax) >>>>>> >> >> >> +{ >>>>>> >> >> >> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) && >>>>>> >> ASSERT_LEVEL >>>>>> >> >> >>= 2 >>>>>> >> >> >> + if (amin > amax) abort(); >>>>>> >> >> >> +#endif >>>>>> >> >> >> + // INT64_MAX+1,INT64_MIN are exactly representable as IEEE >>>>>> >> doubles >>>>>> >> >> >> + if (a >= 9223372036854775808.0 || llrint(a) >= amax) >>>>>> >> >> >> + return amax; >>>>>> >> >> >> + if (a <= -9223372036854775808.0 || llrint(a) <= amin) >>>>>> >> >> >> + return amin; >>>>>> >> >> > >>>>>> >> >> > Doesn't this allow negative overflows in the max check? I think >>>>>> >> >> > you >>>>>> >> need >>>>>> >> >> > both overflow checks before the min/max checks. Try the next >>>>>> >> >> > float >>>>>> val >>>>>> >> >> > smaller than int64_min as input with a small amax, eg 0. I bet it >>>>>> >> >> returns 0 >>>>>> >> >> > instead of amin. >>>>>> >> >> >>>>>> >> >> They are needed. As you and others can clearly see, I am quite bad >>>>>> >> >> with this stuff. >>>>>> >> > >>>>>> >> > >>>>>> >> > Hm, so, getting back to my computer, I wanted to test this, and I >>>>>> >> > have >>>>>> >> this >>>>>> >> > problem: llrint() works correctly for me for the "undefined" cases, >>>>>> i.e., >>>>>> >> > it already does what you're trying to fix in av_rint64_clip_c. >>>>>> >> > >>>>>> >> > llrint(-10223372056756029440.000000) returns -9223372036854775808 >>>>>> >> > llrint(10223372056756029440.000000) returns 9223372036854775807 >>>>>> >> > >>>>>> >> > So, how do I reproduce that llrint() overflows? >>>>>> >> >>>>>> >> The link I gave originally >>>>>> >> >>>>>> http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer >>>>>> >> gives an illustration. Maybe the weird behavior happens only on >>>>>> >> 9223372036854775808.0. This happens because INT64_MAX+1 is not >>>>>> >> representable in long long, and hence signed overflow occurs yielding >>>>>> >> INT64_MIN (of course undefined). Here is a minimal test program: >>>>>> >> #include <stdio.h> >>>>>> >> #include <math.h> >>>>>> >> >>>>>> >> int main(void) { >>>>>> >> printf("%lld\n", llrint(9223372036854775808.0)); >>>>>> >> return 0; >>>>>> >> } >>>>>> >> >>>>>> > >>>>>> > 9223372036854775807 >>>>>> > >>>>>> > Seems apple's libc got one thing right :) >>>>>> >>>>>> I personally am not that charitable, looking more carefully at your >>>>>> asm shows a cmplesd, suggesting slowdown. Here is a source reference: >>>>>> https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c. >>>>>> As usual, Apple dumps many implementations of llrint and it is unclear >>>>>> which is actually being used on OS X at the moment (see e.g other >>>>>> https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c), >>>>>> but I digress. >>>>>> >>>>>> They essentially all put special case code like the patch above. Thus >>>>>> their function is inherently slower than the conformant GNU libm >>>>>> implementation. A client may very well want a branch free llrint for >>>>>> speed. Apple offers no performance choice here, forcing a fast llrint >>>>>> to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected >>>>>> by this slowdown. >>>>> >>>>> >>>>> I think FFmpeg should consider using Apple's version as a x86 >>>>> implementation for av_rint64_clip :) >>>> >>>> I don't agree with this: it is a far less readable implementation with >>>> many more lines of code, and worse yet only handles the llrint aspect >>>> and not the clipping. Regardless, belongs to a separate patch/thread. >>>> Pushed. Thanks all for reviews. >>>> >>> >>> This change broke building on VS2012, llrint is apprently not available >>> there. >>> Note that this is a public header, so our compat headers ala >>> avutil/libm.h cannot be included there. >> >> Hmm, so I could create a local avpriv_llrint with some ifdefry - e.g >> for GNU_C, use llrint, else use a slower implementation on the lines >> of Apple's. >> Any cleaner solutions? > > Possibly better idea: use floor(f + 0.5) as a hack (c89) on things > lacking llrint (via a HAVE_LLRINT check). This won't result in > identical output past a sufficiently large power of 2, but is still a > safe API. It is also clearer and smaller. Idea inspired by > avcodec/mpegaudio_tablegen.h (where this hack may be removed). >
This code is in a public header, public headers don't have access to config.h, so no HAVE_* checks. You could make it non-inline, then you have all the freedom in the world. - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel