On 25/02/15 12:43 PM, Clément Bœsch wrote: > On Tue, Feb 24, 2015 at 10:05:24PM -0300, James Almer wrote: >> Signed-off-by: James Almer <jamr...@gmail.com> >> --- >> I decided to go the configure route since other features (cmov, clz) also do >> it , but if prefered this could instead be done with a new intmath.h header >> in the x86/ folder containing something like >> >> #if defined(__GNUC__) && defined(__POPCNT__) >> #define av_popcount __builtin_popcount >> #if ARCH_X86_64 >> #define av_popcount64 __builtin_popcountll >> #endif >> #endif >> >> For a cleaner compile time check. >> >> configure | 12 ++++++++++-- >> libavutil/intmath.h | 13 +++++++++++++ >> 2 files changed, 23 insertions(+), 2 deletions(-) >> > > For the record, the builtin implementation looks like this here: > > 0000000000000000 <av_popcount_c>: > 0: 89 f8 mov %edi,%eax > 2: d1 e8 shr %eax > 4: 25 55 55 55 55 and $0x55555555,%eax > 9: 29 c7 sub %eax,%edi > b: 89 fa mov %edi,%edx > d: c1 ef 02 shr $0x2,%edi > 10: 81 e2 33 33 33 33 and $0x33333333,%edx > 16: 81 e7 33 33 33 33 and $0x33333333,%edi > 1c: 8d 04 17 lea (%rdi,%rdx,1),%eax > 1f: 89 c2 mov %eax,%edx > 21: c1 ea 04 shr $0x4,%edx > 24: 01 d0 add %edx,%eax > 26: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax > 2b: 89 c2 mov %eax,%edx > 2d: c1 ea 08 shr $0x8,%edx > 30: 01 d0 add %edx,%eax > 32: 89 c2 mov %eax,%edx > 34: c1 ea 10 shr $0x10,%edx > 37: 01 d0 add %edx,%eax > 39: 83 e0 3f and $0x3f,%eax > 3c: c3 retq > 3d: 0f 1f 00 nopl (%rax) > > 0000000000000040 <popcount_gcc>: > 40: 48 83 ec 08 sub $0x8,%rsp > 44: 89 ff mov %edi,%edi > 46: e8 00 00 00 00 callq 4b <popcount_gcc+0xb> > 4b: 48 83 c4 08 add $0x8,%rsp > 4f: c3 retq > > 0000000000000040 <popcount_clang>: > 40: 89 f8 mov %edi,%eax > 42: d1 e8 shr %eax > 44: 25 55 55 55 55 and $0x55555555,%eax > 49: 29 c7 sub %eax,%edi > 4b: 89 f8 mov %edi,%eax > 4d: 25 33 33 33 33 and $0x33333333,%eax > 52: c1 ef 02 shr $0x2,%edi > 55: 81 e7 33 33 33 33 and $0x33333333,%edi > 5b: 01 c7 add %eax,%edi > 5d: 89 f8 mov %edi,%eax > 5f: c1 e8 04 shr $0x4,%eax > 62: 01 f8 add %edi,%eax > 64: 25 0f 0f 0f 0f and $0xf0f0f0f,%eax > 69: 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax > 6f: c1 e8 18 shr $0x18,%eax > 72: c3 retq > > We might see relevant "optimizations" for our reference code.
What's clang code for av_popcount64_c, or their builtin? We're currently calling av_popcount_c twice from within av_popcount64_c, when on x86_64 cpus we could probably take advantage of the 64bits gprs. > > [...] > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel