On Sat, Nov 14, 2015 at 10:38 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Sat, Nov 14, 2015 at 4:30 PM, Hendrik Leppkes <h.lepp...@gmail.com> wrote: >> On Sat, Nov 14, 2015 at 10:27 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >> wrote: >>> On Sat, Nov 14, 2015 at 4:03 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >>> wrote: >>>> On Sat, Nov 14, 2015 at 3:28 PM, Hendrik Leppkes <h.lepp...@gmail.com> >>>> wrote: >>>>> On Sat, Nov 14, 2015 at 3:51 AM, Ganesh Ajjanagadde <gajja...@mit.edu> >>>>> wrote: >>>>>> On Fri, Nov 13, 2015 at 7:17 PM, Ronald S. Bultje <rsbul...@gmail.com> >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Fri, Nov 13, 2015 at 6:16 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >>>>>>> wrote: >>>>>>> >>>>>>>> On Fri, Nov 13, 2015 at 4:52 PM, Ronald S. Bultje <rsbul...@gmail.com> >>>>>>>> wrote: >>>>>>>> > Hi, >>>>>>>> > >>>>>>>> > On Fri, Nov 13, 2015 at 4:28 PM, Ganesh Ajjanagadde >>>>>>>> > <gajja...@mit.edu> >>>>>>>> > wrote: >>>>>>>> > >>>>>>>> >> On Fri, Nov 13, 2015 at 1:28 PM, Ronald S. Bultje >>>>>>>> >> <rsbul...@gmail.com> >>>>>>>> >> wrote: >>>>>>>> >> > Hi, >>>>>>>> >> > >>>>>>>> >> > On Fri, Nov 13, 2015 at 12:17 PM, Ganesh Ajjanagadde < >>>>>>>> gajja...@mit.edu> >>>>>>>> >> > wrote: >>>>>>>> >> > >>>>>>>> >> >> On Fri, Nov 13, 2015 at 12:10 PM, Ronald S. Bultje < >>>>>>>> rsbul...@gmail.com> >>>>>>>> >> >> wrote: >>>>>>>> >> >> > Hi Ganesh, >>>>>>>> >> >> > On Nov 13, 2015 12:02 PM, "Ganesh Ajjanagadde" < >>>>>>>> >> gajjanaga...@gmail.com> >>>>>>>> >> >> > wrote: >>>>>>>> >> >> >> >>>>>>>> >> >> >> The rationale for this function is reflected in the >>>>>>>> >> >> >> documentation >>>>>>>> for >>>>>>>> >> >> >> it, and is copied here: >>>>>>>> >> >> >> >>>>>>>> >> >> >> Clip a double value into the long long amin-amax range. >>>>>>>> >> >> >> This function is needed because conversion of floating point >>>>>>>> >> >> >> to >>>>>>>> >> integers >>>>>>>> >> >> > when >>>>>>>> >> >> >> it does not fit in the integer's representation does not >>>>>>>> necessarily >>>>>>>> >> >> > saturate >>>>>>>> >> >> >> correctly (usually converted to a cvttsd2si on x86) which >>>>>>>> saturates >>>>>>>> >> >> > numbers >>>>>>>> >> >> >> > INT64_MAX to INT64_MIN. The standard marks such conversions >>>>>>>> >> >> >> > as >>>>>>>> >> >> undefined >>>>>>>> >> >> >> behavior, allowing this sort of mathematically bogus >>>>>>>> >> >> >> conversions. >>>>>>>> >> This >>>>>>>> >> >> > provides >>>>>>>> >> >> >> a safe alternative that is slower obviously but assures >>>>>>>> >> >> >> safety and >>>>>>>> >> >> better >>>>>>>> >> >> >> mathematical behavior. >>>>>>>> >> >> >> API: >>>>>>>> >> >> >> @param a value to clip >>>>>>>> >> >> >> @param amin minimum value of the clip range >>>>>>>> >> >> >> @param amax maximum value of the clip range >>>>>>>> >> >> >> @return clipped value >>>>>>>> >> >> >> >>>>>>>> >> >> >> Note that a priori if one can guarantee from the calling side >>>>>>>> >> >> >> that >>>>>>>> >> the >>>>>>>> >> >> >> double is in range, it is safe to simply do an >>>>>>>> >> >> >> explicit/implicit >>>>>>>> >> cast, >>>>>>>> >> >> >> and that will be far faster. However, otherwise this function >>>>>>>> should >>>>>>>> >> be >>>>>>>> >> >> >> used. >>>>>>>> >> >> >> >>>>>>>> >> >> >> avutil minor version is bumped. >>>>>>>> >> >> >> >>>>>>>> >> >> >> Reviewed-by: Ronald S. Bultje <rsbul...@gmail.com> >>>>>>>> >> >> >> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >>>>>>>> >> >> >> --- >>>>>>>> >> >> >> libavutil/common.h | 30 ++++++++++++++++++++++++++++++ >>>>>>>> >> >> >> libavutil/version.h | 2 +- >>>>>>>> >> >> >> 2 files changed, 31 insertions(+), 1 deletion(-) >>>>>>>> >> >> >> >>>>>>>> >> >> >> diff --git a/libavutil/common.h b/libavutil/common.h >>>>>>>> >> >> >> index 6f0f582..f4687ab 100644 >>>>>>>> >> >> >> --- a/libavutil/common.h >>>>>>>> >> >> >> +++ b/libavutil/common.h >>>>>>>> >> >> >> @@ -298,6 +298,33 @@ static av_always_inline av_const double >>>>>>>> >> >> > av_clipd_c(double a, double amin, double >>>>>>>> >> >> >> else return a; >>>>>>>> >> >> >> } >>>>>>>> >> >> >> >>>>>>>> >> >> >> +/** >>>>>>>> >> >> >> + * Clip and convert a double value into the long long >>>>>>>> >> >> >> amin-amax >>>>>>>> >> range. >>>>>>>> >> >> >> + * This function is needed because conversion of floating >>>>>>>> >> >> >> point >>>>>>>> to >>>>>>>> >> >> > integers when >>>>>>>> >> >> >> + * it does not fit in the integer's representation does not >>>>>>>> >> necessarily >>>>>>>> >> >> > saturate >>>>>>>> >> >> >> + * correctly (usually converted to a cvttsd2si on x86) which >>>>>>>> >> saturates >>>>>>>> >> >> > numbers >>>>>>>> >> >> >> + * > INT64_MAX to INT64_MIN. The standard marks such >>>>>>>> >> >> >> conversions >>>>>>>> as >>>>>>>> >> >> > undefined >>>>>>>> >> >> >> + * behavior, allowing this sort of mathematically bogus >>>>>>>> conversions. >>>>>>>> >> >> > This provides >>>>>>>> >> >> >> + * a safe alternative that is slower obviously but assures >>>>>>>> >> >> >> safety >>>>>>>> >> and >>>>>>>> >> >> > better >>>>>>>> >> >> >> + * mathematical behavior. >>>>>>>> >> >> >> + * @param a value to clip >>>>>>>> >> >> >> + * @param amin minimum value of the clip range >>>>>>>> >> >> >> + * @param amax maximum value of the clip range >>>>>>>> >> >> >> + * @return clipped value >>>>>>>> >> >> >> + */ >>>>>>>> >> >> >> +static av_always_inline av_const int64_t >>>>>>>> >> >> >> av_rint64_clip_c(double >>>>>>>> a, >>>>>>>> >> >> > int64_t amin, int64_t amax) >>>>>>>> >> >> >> +{ >>>>>>>> >> >> >> +#if defined(HAVE_AV_CONFIG_H) && defined(ASSERT_LEVEL) && >>>>>>>> >> ASSERT_LEVEL >>>>>>>> >> >> >>= 2 >>>>>>>> >> >> >> + if (amin > amax) abort(); >>>>>>>> >> >> >> +#endif >>>>>>>> >> >> >> + // INT64_MAX+1,INT64_MIN are exactly representable as >>>>>>>> >> >> >> IEEE >>>>>>>> >> doubles >>>>>>>> >> >> >> + if (a >= 9223372036854775808.0 || llrint(a) >= amax) >>>>>>>> >> >> >> + return amax; >>>>>>>> >> >> >> + if (a <= -9223372036854775808.0 || llrint(a) <= amin) >>>>>>>> >> >> >> + return amin; >>>>>>>> >> >> > >>>>>>>> >> >> > Doesn't this allow negative overflows in the max check? I >>>>>>>> >> >> > think you >>>>>>>> >> need >>>>>>>> >> >> > both overflow checks before the min/max checks. Try the next >>>>>>>> >> >> > float >>>>>>>> val >>>>>>>> >> >> > smaller than int64_min as input with a small amax, eg 0. I bet >>>>>>>> >> >> > it >>>>>>>> >> >> returns 0 >>>>>>>> >> >> > instead of amin. >>>>>>>> >> >> >>>>>>>> >> >> They are needed. As you and others can clearly see, I am quite >>>>>>>> >> >> bad >>>>>>>> >> >> with this stuff. >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > Hm, so, getting back to my computer, I wanted to test this, and I >>>>>>>> >> > have >>>>>>>> >> this >>>>>>>> >> > problem: llrint() works correctly for me for the "undefined" >>>>>>>> >> > cases, >>>>>>>> i.e., >>>>>>>> >> > it already does what you're trying to fix in av_rint64_clip_c. >>>>>>>> >> > >>>>>>>> >> > llrint(-10223372056756029440.000000) returns -9223372036854775808 >>>>>>>> >> > llrint(10223372056756029440.000000) returns 9223372036854775807 >>>>>>>> >> > >>>>>>>> >> > So, how do I reproduce that llrint() overflows? >>>>>>>> >> >>>>>>>> >> The link I gave originally >>>>>>>> >> >>>>>>>> http://blog.frama-c.com/index.php?post/2013/10/09/Overflow-float-integer >>>>>>>> >> gives an illustration. Maybe the weird behavior happens only on >>>>>>>> >> 9223372036854775808.0. This happens because INT64_MAX+1 is not >>>>>>>> >> representable in long long, and hence signed overflow occurs >>>>>>>> >> yielding >>>>>>>> >> INT64_MIN (of course undefined). Here is a minimal test program: >>>>>>>> >> #include <stdio.h> >>>>>>>> >> #include <math.h> >>>>>>>> >> >>>>>>>> >> int main(void) { >>>>>>>> >> printf("%lld\n", llrint(9223372036854775808.0)); >>>>>>>> >> return 0; >>>>>>>> >> } >>>>>>>> >> >>>>>>>> > >>>>>>>> > 9223372036854775807 >>>>>>>> > >>>>>>>> > Seems apple's libc got one thing right :) >>>>>>>> >>>>>>>> I personally am not that charitable, looking more carefully at your >>>>>>>> asm shows a cmplesd, suggesting slowdown. Here is a source reference: >>>>>>>> https://opensource.apple.com/source/Libm/Libm-2026/Source/ARM/llrint.c. >>>>>>>> As usual, Apple dumps many implementations of llrint and it is unclear >>>>>>>> which is actually being used on OS X at the moment (see e.g other >>>>>>>> https://opensource.apple.com/source/Libm/Libm-92/i386.subproj/llrint.c), >>>>>>>> but I digress. >>>>>>>> >>>>>>>> They essentially all put special case code like the patch above. Thus >>>>>>>> their function is inherently slower than the conformant GNU libm >>>>>>>> implementation. A client may very well want a branch free llrint for >>>>>>>> speed. Apple offers no performance choice here, forcing a fast llrint >>>>>>>> to use cvt2dsi inline or equivalent. Don't know if FFmpeg is affected >>>>>>>> by this slowdown. >>>>>>> >>>>>>> >>>>>>> I think FFmpeg should consider using Apple's version as a x86 >>>>>>> implementation for av_rint64_clip :) >>>>>> >>>>>> I don't agree with this: it is a far less readable implementation with >>>>>> many more lines of code, and worse yet only handles the llrint aspect >>>>>> and not the clipping. Regardless, belongs to a separate patch/thread. >>>>>> Pushed. Thanks all for reviews. >>>>>> >>>>> >>>>> This change broke building on VS2012, llrint is apprently not available >>>>> there. >>>>> Note that this is a public header, so our compat headers ala >>>>> avutil/libm.h cannot be included there. >>>> >>>> Hmm, so I could create a local avpriv_llrint with some ifdefry - e.g >>>> for GNU_C, use llrint, else use a slower implementation on the lines >>>> of Apple's. >>>> Any cleaner solutions? >>> >>> Possibly better idea: use floor(f + 0.5) as a hack (c89) on things >>> lacking llrint (via a HAVE_LLRINT check). This won't result in >>> identical output past a sufficiently large power of 2, but is still a >>> safe API. It is also clearer and smaller. Idea inspired by >>> avcodec/mpegaudio_tablegen.h (where this hack may be removed). >>> >> >> This code is in a public header, public headers don't have access to >> config.h, so no HAVE_* checks. >> You could make it non-inline, then you have all the freedom in the world. > > I do understand this is breakage from my end, but just a request that > may help in resolving this: > @Hendrik or others with Windows: could you do the needful? > Rationale: > 1. I lack Windows myself, so I can't test if my proposed solution even works. > 2. As seen above, my understanding of these things is vague, and in > more experienced hands this will be resolved faster and more > efficiently. >
The easiest way would be to just make it non-inline and use all the compat mechanics we already have in place. All other options would turn out rather ugly, we don't want to set a precedence for a big jungle of ifdefs to hard-code presence of llrint in here. Of course that would cost a bit of performance when this function is used, but trying to create fully portable code in inline functions in public headers seems like a painful objective. - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel