Hi, On Fri, Nov 13, 2015 at 6:16 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote:
> On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbul...@gmail.com> > wrote: > > Hi, > > > > On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde <gajja...@mit.edu> > > wrote: > > > >> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje <rsbul...@gmail.com> > >> wrote: > >> > Hi, > >> > > >> > On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde < > gajja...@mit.edu> > >> > wrote: > >> > > >> >> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje < > rsbul...@gmail.com> > >> >> wrote: > >> >> > Hi Ganesh, > >> >> > On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" < > >> gajjanaga...@gmail.com> > >> >> > wrote: > >> >> >> > >> >> >> The rationale for this function is reflected in the documentation > for > >> >> >> it, and is copied here: > >> >> >> > >> >> >> Clip a double value into the long long amin-amax range. > >> >> >> This function is needed because conversion of floating point to > >> integers > >> >> > when > >> >> >> it does not fit in the integer's representation does not > necessarily > >> >> > saturate > >> >> >> correctly (usually converted to a cvttsd2si on x86) which > saturates > >> >> > numbers > >> >> >> > INT64_MAX to INT64_MIN. The standard marks such conversions as > >> >> undefined > >> >> >> behavior, allowing this sort of mathematically bogus conversions. > >> This > >> >> > provides > >> >> >> a safe alternative that is slower obviously but assures safety and > >> >> better > >> >> >> mathematical behavior. > >> >> >> API: > >> >> >> @param a value to clip > >> >> >> @param amin minimum value of the clip range > >> >> >> @param amax maximum value of the clip range > >> >> >> @return clipped value > >> >> >> > >> >> >> Note that a priori if one can guarantee from the calling side that > >> the > >> >> >> double is in range, it is safe to simply do an explicit/implicit > >> cast, > >> >> >> and that will be far faster. However, otherwise this function > should > >> be > >> >> >> used. > >> >> >> > >> >> >> avutil minor version is bumped. > >> >> >> > >> >> >> Reviewed-by: Ronald S. Bultje <rsbul...@gmail.com> > >> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > >> >> >> --- > >> >> >> libavutil/common.h | 30 ++++++++++++++++++++++++++++++ > >> >> >> libavutil/version.h | 2 +- > >> >> >> 2 files changed, 31 insertions(+), 1 deletion(-) > >> >> >> > >> >> >> diff --git a/libavutil/common.h b/libavutil/common.h > >> >> >> index 6f0f582..f4687ab 100644 > >> >> >> --- a/libavutil/common.h > >> >> >> +++ b/libavutil/common.h > >> >> >> @@ -298,6 +298,33 @@ static av_always_inline av_const double > >> >> > av_clipd_c(double a, double amin, double > >> >> >> else return a; > >> >> >> } > >> >> >> > >> >> >> +/** > >> >> >> + * Clip and convert a double value into the long long amin-amax > >> range. > >> >> >> + * This function is needed because conversion of floating point > to > >> >> > integers when > >> >> >> + * it does not fit in the integer's representation does not > >> necessarily > >> >> > saturate > >> >> >> + * correctly (usually converted to a cvttsd2si on x86) which > >> saturates > >> >> > numbers > >> >> >> + * > INT64_MAX to INT64_MIN. The standard marks such conversions > as > >> >> > undefined > >> >> >> + * behavior, allowing this sort of mathematically bogus > conversions. > >> >> > This provides > >> >> >> + * a safe alternative that is slower obviously but assures safety > >> and > >> >> > better > >> >> >> + * mathematical behavior. > >> >> >> + * @param a value to clip > >> >> >> + * @param amin minimum value of the clip range > >> >> >> + * @param amax maximum value of the clip range > >> >> >> + * @return clipped value > >> >> >> + */ > >> >> >> +static av_always_inline av_const int64_t av_rint64_clip_c(double > a, > >> >> > int64_t amin, int64_t amax) > >> >> >> +{ > >> >> >> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) && > >> ASSERT_LEVEL > >> >> >>= 2 > >> >> >> + if (amin > amax) abort(); > >> >> >> +#endif > >> >> >> + // INT64_MAX+1,INT64_MIN are exactly representable as IEEE > >> doubles > >> >> >> + if (a >= 9223372036854775808.0 || llrint(a) >= amax) > >> >> >> + return amax; > >> >> >> + if (a <= -9223372036854775808.0 || llrint(a) <= amin) > >> >> >> + return amin; > >> >> > > >> >> > Doesn't this allow negative overflows in the max check? I think you > >> need > >> >> > both overflow checks before the min/max checks. Try the next float > val > >> >> > smaller than int64_min as input with a small amax, eg 0. I bet it > >> >> returns 0 > >> >> > instead of amin. > >> >> > >> >> They are needed. As you and others can clearly see, I am quite bad > >> >> with this stuff. > >> > > >> > > >> > Hm, so, getting back to my computer, I wanted to test this, and I have > >> this > >> > problem: llrint() works correctly for me for the "undefined" cases, > i.e., > >> > it already does what you're trying to fix in av_rint64_clip_c. > >> > > >> > llrint(-10223372056756029440.000000) returns -9223372036854775808 > >> > llrint(10223372056756029440.000000) returns 9223372036854775807 > >> > > >> > So, how do I reproduce that llrint() overflows? > >> > >> The link I gave originally > >> > http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer > >> gives an illustration. Maybe the weird behavior happens only on > >> 9223372036854775808.0. This happens because INT64_MAX+1 is not > >> representable in long long, and hence signed overflow occurs yielding > >> INT64_MIN (of course undefined). Here is a minimal test program: > >> #include <stdio.h> > >> #include <math.h> > >> > >> int main(void) { > >> printf("%lld\n", llrint(9223372036854775808.0)); > >> return 0; > >> } > >> > > > > 9223372036854775807 > > > > Seems apple's libc got one thing right :) > > I personally am not that charitable, looking more carefully at your > asm shows a cmplesd, suggesting slowdown. Here is a source reference: > https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c. > As usual, Apple dumps many implementations of llrint and it is unclear > which is actually being used on OS X at the moment (see e.g other > https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c), > but I digress. > > They essentially all put special case code like the patch above. Thus > their function is inherently slower than the conformant GNU libm > implementation. A client may very well want a branch free llrint for > speed. Apple offers no performance choice here, forcing a fast llrint > to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected > by this slowdown. I think FFmpeg should consider using Apple's version as a x86 implementation for av_rint64_clip :) Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel