On 1/7/2016 12:19 AM, Ronald S. Bultje wrote: > Hi, > > On Wed, Jan 6, 2016 at 8:09 PM, James Almer <jamr...@gmail.com> wrote: > >> Signed-off-by: James Almer <jamr...@gmail.com> >> --- >> libavutil/x86/intmath.h | 32 ++++++++++++++++++++++++++++++++ >> 1 file changed, 32 insertions(+) >> >> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h >> index 611ef88..e1cd596 100644 >> --- a/libavutil/x86/intmath.h >> +++ b/libavutil/x86/intmath.h >> @@ -98,6 +98,38 @@ static av_always_inline av_const unsigned >> av_mod_uintp2_bmi2(unsigned a, unsigne >> >> #endif /* __BMI2__ */ >> >> +#if defined(__SSE2__) >> + >> +#define av_clipd av_clipd_sse2 >> +static av_always_inline av_const double av_clipd_sse2(double a, double >> amin, double amax) >> +{ >> +#if defined(ASSERT_LEVEL) && ASSERT_LEVEL >= 2 >> + if (amin > amax) abort(); >> +#endif >> + __asm__ ("minsd %2, %0 \n\t" >> + "maxsd %1, %0 \n\t" >> + : "+x"(a) : "xm"(amin), "xm"(amax)); >> + return a; >> +} >> + >> +#endif /* __SSE2__ */ > > > This __SSE2__ is kind of strange, and we don't use it anywhere else. I > understand it's not the same thing, but for practical purposes, could we > just use #if ARCH_X86_64 and not care about -msse2? > > Ronald
We use it in x86/intreadwrite.h for AV_ZERO128. And no, I'd rather have it working on x86_32 when -msse2 is used since it's much more efficient. Compare: 00000000 <_av_clipf_sse>: 0: 83 ec 0c sub esp,0xc 3: f2 0f 10 44 24 10 movsd xmm0,QWORD PTR [esp+0x10] 9: f2 0f 5d 44 24 20 minsd xmm0,QWORD PTR [esp+0x20] f: f2 0f 5f 44 24 18 maxsd xmm0,QWORD PTR [esp+0x18] 15: f2 0f 11 04 24 movsd QWORD PTR [esp],xmm0 1a: dd 04 24 fld QWORD PTR [esp] 1d: 83 c4 0c add esp,0xc 20: c3 ret with: 00000030 <_av_clipf_c>: 30: dd 44 24 04 fld QWORD PTR [esp+0x4] 34: dd 44 24 14 fld QWORD PTR [esp+0x14] 38: dd 44 24 0c fld QWORD PTR [esp+0xc] 3c: db ea fucomi st,st(2) 3e: 77 10 ja 50 <_clipf_c+0x20> 40: dd d8 fstp st(0) 42: d9 c9 fxch st(1) 44: db e9 fucomi st,st(1) 46: db d1 fcmovnbe st,st(1) 48: dd d9 fstp st(1) 4a: eb 08 jmp 54 <_clipf_c+0x24> 4c: 8d 74 26 00 lea esi,[esi+eiz*1+0x0] 50: dd d9 fstp st(1) 52: dd d9 fstp st(1) 54: f3 c3 repz ret _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel