On 25.02.2015, at 19:29, "Ronald S. Bultje" <rsbul...@gmail.com> wrote: > Hi, > > On Wed, Feb 25, 2015 at 1:21 PM, James Almer <jamr...@gmail.com> wrote: > >> On 25/02/15 2:58 PM, Ronald S. Bultje wrote: >>> Hi, >>> >>> On Wed, Feb 25, 2015 at 12:52 PM, James Almer <jamr...@gmail.com> wrote: >>> >>>> On 25/02/15 2:36 PM, Ronald S. Bultje wrote: >>>>> Hi James, >>>>> >>>>> On Wed, Feb 25, 2015 at 12:25 PM, James Almer <jamr...@gmail.com> >> wrote: >>>>> >>>>>> On 25/02/15 9:41 AM, Ronald S. Bultje wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Tue, Feb 24, 2015 at 8:05 PM, James Almer <jamr...@gmail.com> >>>> wrote: >>>>>>>> >>>>>>>> +#if HAVE_FAST_POPCNT >>>>>>>> +#if AV_GCC_VERSION_AT_LEAST(4,5) >>>>>>>> +#ifndef av_popcount >>>>>>>> + #define av_popcount __builtin_popcount >>>>>>>> +#endif /* av_popcount */ >>>>>>>> +#if HAVE_FAST_64BIT >>>>>>>> +#ifndef av_popcount64 >>>>>>>> + #define av_popcount64 __builtin_popcountll >>>>>>>> +#endif /* av_popcount64 */ >>>>>>>> +#endif /* HAVE_FAST_64BIT */ >>>>>>>> +#endif /* AV_GCC_VERSION_AT_LEAST(4,5) */ >>>>>>>> +#endif /* HAVE_FAST_POPCNT */ >>>>>>>> >>>>>>> >>>>>>> Is this just to get the sse4 popcnt instruction if we compile with >>>>>>> -mcpu=sse4? The slightly odd thing is that we're using a built-in, >> yet >>>>>>> configure still does an arch/cpu check. I'd expect the >>>> built-in/compiler >>>>>> to >>>>>>> do that for us based on -mcpu, and we could always unconditionally >> use >>>>>> this >>>>>>> (as long as gcc >= 4.5); alternatively, you could use inline asm and >>>> then >>>>>>> have the configure check (HAVE_FAST_POPCNT). But doing both seems a >>>>>> little >>>>>>> odd. I have no objection to it, patch is still fine, just odd. >>>>>>> >>>>>>> Ronald >>>>>> >>>>>> I purposely made the checks for gcc 4.5 and in configure for cpus that >>>>>> support popcnt >>>>>> because otherwise __builtin_popcount (at least gcc's) is slower than >> our >>>>>> generic >>>>>> av_popcount_c function from lavu/common.h. >>>>>> When the CPU supports popcnt the builtin becomes a single inlined >>>>>> instruction. >>>>>> >>>>>> I tried the __asm__ approach, but the code generated by the builtin >>>> seemed >>>>>> better. >>>>> >>>>> >>>>> That's interesting, can you show the code you tried? >>>>> >>>>> Ronald >>>> >>>> If instead of builtins i use >>>> >>>> +#if HAVE_INLINE_ASM >>>> + >>>> +#ifdef __POPCNT__ >>>> + >>>> +#define av_popcount av_popcount_abm >>>> +static av_always_inline av_const int av_popcount_abm(uint32_t a) >>>> +{ >>>> + int x; >>>> + >>>> + __asm__ ("popcnt %1, %0" : "=r" (x) : "rm" (a)); >>>> + return x; >>>> +} >>>> + >>>> +#if ARCH_X86_64 >>>> +#define av_popcount64 av_popcount64_abm >>>> +static av_always_inline av_const int av_popcount64_abm(uint64_t a) >>>> +{ >>>> + uint64_t x; >>>> + >>>> + __asm__ ("popcnt %1, %0" : "=r" (x) : "rm" (a)); >>>> + return x; >>>> +} >>>> +#endif >>>> + >>>> +#endif /* __POPCNT__ */ >>>> + >>>> +#endif /* HAVE_INLINE_ASM */ >>>> >>>> in the x86/ header from the second version i sent, on x86_32 >> av_cpu_count >>>> from >>>> libavutil/cpu.o on Windows (Will not work with other OSes because of >>>> HAVE_GETPROCESSAFFINITYMASK) generates >>>> >>>> 1af: f3 0f b8 5c 24 18 popcnt ebx,DWORD PTR [esp+0x18] >>>> 1b5: 31 c0 xor eax,eax >>>> 1b7: f3 0f b8 c0 popcnt eax,eax >>>> 1bb: 01 c3 add ebx,eax >>>> >>>> Whereas with the builtins i instead get >>>> >>>> 1af: f3 0f b8 5c 24 18 popcnt ebx,DWORD PTR [esp+0x18] >>>> >>>> This is probably because of av_popcount64_c (Used unconditionally on >>>> x86_32) calling >>>> av_popcount twice, and proc_aff being DWORD_PTR type means it's 4 bytes >> in >>>> x86_32. >>>> The builtin seems to know proc_aff can't be right shifted 32 bits so it >>>> generates >>>> a single popcnt. >>> >>> >>> It seems to me rather than the second version is defined as "pure" (or >>> whatever the keyword was), i.e. "we don't depend on external state" -> >> "if >>> you call this with a constant, we can calculate this at compile-time". >>> >>> Ronald >> >> Do you know how to get the asm version working right for this one case? >> > > No, const should do it but looks like inline asm breaks that assumption.
There are gcc builtins to check if something is a constant and use e.g. the c version then. Not sure if it works here or makes sense, just in principle. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel