On Sat, Nov 14, 2015 at 3:51 AM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Fri, Nov 13, 2015 at 7:17 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: >> Hi, >> >> On Fri, Nov 13, 2015 at 6:16 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >> wrote: >> >>> On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbul...@gmail.com> >>> wrote: >>> > Hi, >>> > >>> > On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >>> > wrote: >>> > >>> >> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje <rsbul...@gmail.com> >>> >> wrote: >>> >> > Hi, >>> >> > >>> >> > On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde < >>> gajja...@mit.edu> >>> >> > wrote: >>> >> > >>> >> >> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje < >>> rsbul...@gmail.com> >>> >> >> wrote: >>> >> >> > Hi Ganesh, >>> >> >> > On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" < >>> >> gajjanaga...@gmail.com> >>> >> >> > wrote: >>> >> >> >> >>> >> >> >> The rationale for this function is reflected in the documentation >>> for >>> >> >> >> it, and is copied here: >>> >> >> >> >>> >> >> >> Clip a double value into the long long amin-amax range. >>> >> >> >> This function is needed because conversion of floating point to >>> >> integers >>> >> >> > when >>> >> >> >> it does not fit in the integer's representation does not >>> necessarily >>> >> >> > saturate >>> >> >> >> correctly (usually converted to a cvttsd2si on x86) which >>> saturates >>> >> >> > numbers >>> >> >> >> > INT64_MAX to INT64_MIN. The standard marks such conversions as >>> >> >> undefined >>> >> >> >> behavior, allowing this sort of mathematically bogus conversions. >>> >> This >>> >> >> > provides >>> >> >> >> a safe alternative that is slower obviously but assures safety and >>> >> >> better >>> >> >> >> mathematical behavior. >>> >> >> >> API: >>> >> >> >> @param a value to clip >>> >> >> >> @param amin minimum value of the clip range >>> >> >> >> @param amax maximum value of the clip range >>> >> >> >> @return clipped value >>> >> >> >> >>> >> >> >> Note that a priori if one can guarantee from the calling side that >>> >> the >>> >> >> >> double is in range, it is safe to simply do an explicit/implicit >>> >> cast, >>> >> >> >> and that will be far faster. However, otherwise this function >>> should >>> >> be >>> >> >> >> used. >>> >> >> >> >>> >> >> >> avutil minor version is bumped. >>> >> >> >> >>> >> >> >> Reviewed-by: Ronald S. Bultje <rsbul...@gmail.com> >>> >> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >>> >> >> >> --- >>> >> >> >> libavutil/common.h | 30 ++++++++++++++++++++++++++++++ >>> >> >> >> libavutil/version.h | 2 +- >>> >> >> >> 2 files changed, 31 insertions(+), 1 deletion(-) >>> >> >> >> >>> >> >> >> diff --git a/libavutil/common.h b/libavutil/common.h >>> >> >> >> index 6f0f582..f4687ab 100644 >>> >> >> >> --- a/libavutil/common.h >>> >> >> >> +++ b/libavutil/common.h >>> >> >> >> @@ -298,6 +298,33 @@ static av_always_inline av_const double >>> >> >> > av_clipd_c(double a, double amin, double >>> >> >> >> else return a; >>> >> >> >> } >>> >> >> >> >>> >> >> >> +/** >>> >> >> >> + * Clip and convert a double value into the long long amin-amax >>> >> range. >>> >> >> >> + * This function is needed because conversion of floating point >>> to >>> >> >> > integers when >>> >> >> >> + * it does not fit in the integer's representation does not >>> >> necessarily >>> >> >> > saturate >>> >> >> >> + * correctly (usually converted to a cvttsd2si on x86) which >>> >> saturates >>> >> >> > numbers >>> >> >> >> + * > INT64_MAX to INT64_MIN. The standard marks such conversions >>> as >>> >> >> > undefined >>> >> >> >> + * behavior, allowing this sort of mathematically bogus >>> conversions. >>> >> >> > This provides >>> >> >> >> + * a safe alternative that is slower obviously but assures safety >>> >> and >>> >> >> > better >>> >> >> >> + * mathematical behavior. >>> >> >> >> + * @param a value to clip >>> >> >> >> + * @param amin minimum value of the clip range >>> >> >> >> + * @param amax maximum value of the clip range >>> >> >> >> + * @return clipped value >>> >> >> >> + */ >>> >> >> >> +static av_always_inline av_const int64_t av_rint64_clip_c(double >>> a, >>> >> >> > int64_t amin, int64_t amax) >>> >> >> >> +{ >>> >> >> >> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) && >>> >> ASSERT_LEVEL >>> >> >> >>= 2 >>> >> >> >> + if (amin > amax) abort(); >>> >> >> >> +#endif >>> >> >> >> + // INT64_MAX+1,INT64_MIN are exactly representable as IEEE >>> >> doubles >>> >> >> >> + if (a >= 9223372036854775808.0 || llrint(a) >= amax) >>> >> >> >> + return amax; >>> >> >> >> + if (a <= -9223372036854775808.0 || llrint(a) <= amin) >>> >> >> >> + return amin; >>> >> >> > >>> >> >> > Doesn't this allow negative overflows in the max check? I think you >>> >> need >>> >> >> > both overflow checks before the min/max checks. Try the next float >>> val >>> >> >> > smaller than int64_min as input with a small amax, eg 0. I bet it >>> >> >> returns 0 >>> >> >> > instead of amin. >>> >> >> >>> >> >> They are needed. As you and others can clearly see, I am quite bad >>> >> >> with this stuff. >>> >> > >>> >> > >>> >> > Hm, so, getting back to my computer, I wanted to test this, and I have >>> >> this >>> >> > problem: llrint() works correctly for me for the "undefined" cases, >>> i.e., >>> >> > it already does what you're trying to fix in av_rint64_clip_c. >>> >> > >>> >> > llrint(-10223372056756029440.000000) returns -9223372036854775808 >>> >> > llrint(10223372056756029440.000000) returns 9223372036854775807 >>> >> > >>> >> > So, how do I reproduce that llrint() overflows? >>> >> >>> >> The link I gave originally >>> >> >>> http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer >>> >> gives an illustration. Maybe the weird behavior happens only on >>> >> 9223372036854775808.0. This happens because INT64_MAX+1 is not >>> >> representable in long long, and hence signed overflow occurs yielding >>> >> INT64_MIN (of course undefined). Here is a minimal test program: >>> >> #include <stdio.h> >>> >> #include <math.h> >>> >> >>> >> int main(void) { >>> >> printf("%lld\n", llrint(9223372036854775808.0)); >>> >> return 0; >>> >> } >>> >> >>> > >>> > 9223372036854775807 >>> > >>> > Seems apple's libc got one thing right :) >>> >>> I personally am not that charitable, looking more carefully at your >>> asm shows a cmplesd, suggesting slowdown. Here is a source reference: >>> https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c. >>> As usual, Apple dumps many implementations of llrint and it is unclear >>> which is actually being used on OS X at the moment (see e.g other >>> https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c), >>> but I digress. >>> >>> They essentially all put special case code like the patch above. Thus >>> their function is inherently slower than the conformant GNU libm >>> implementation. A client may very well want a branch free llrint for >>> speed. Apple offers no performance choice here, forcing a fast llrint >>> to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected >>> by this slowdown. >> >> >> I think FFmpeg should consider using Apple's version as a x86 >> implementation for av_rint64_clip :) > > I don't agree with this: it is a far less readable implementation with > many more lines of code, and worse yet only handles the llrint aspect > and not the clipping. Regardless, belongs to a separate patch/thread. > Pushed. Thanks all for reviews. >
This change broke building on VS2012, llrint is apprently not available there. Note that this is a public header, so our compat headers ala avutil/libm.h cannot be included there. - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel