Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread James Almer
On 23/08/14 12:15 PM, Christophe Gisquet wrote: > Hi, > > 2014-08-23 17:01 GMT+02:00 James Almer : There's a PACK macro in lavfi/x86/yasm-16.asm that does this without >>> intrinsics. >>> >>> You meant yadif-16, right? >>> >>> Timothy >> >> Oops, yes i meant that :P > > I expect it to be nee

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Christophe Gisquet
Hi, 2014-08-23 17:01 GMT+02:00 James Almer : >>> There's a PACK macro in lavfi/x86/yasm-16.asm that does this without >> intrinsics. >> >> You meant yadif-16, right? >> >> Timothy > > Oops, yes i meant that :P I expect it to be needed for the weighted pred functions, so I'll split it from yadif-1

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread James Almer
On 23/08/14 11:55 AM, Timothy Gu wrote: > On Aug 23, 2014 7:47 AM, "James Almer" wrote: >> >> On 23/08/14 11:07 AM, Mickaël Raulet wrote: >>> For 10bits and 12bits, they should stay sse4 as well because of > packusdw. You need some instructions to convert it to ssse3 see below >>> >>> >>> static a

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Timothy Gu
On Aug 23, 2014 7:47 AM, "James Almer" wrote: > > On 23/08/14 11:07 AM, Mickaël Raulet wrote: > > For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below > > > > > > static av_always_inline __m128i _MM_PACKUS_EPI32( __m1

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread James Almer
On 23/08/14 11:07 AM, Mickaël Raulet wrote: > For 10bits and 12bits, they should stay sse4 as well because of packusdw. You > need some instructions to convert it to ssse3 see below > > > static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b ) > { > a = _mm_slli_epi32 (a, 1

Re: [FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Mickaël Raulet
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You need some instructions to convert it to ssse3 see below static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b ) { a = _mm_slli_epi32 (a, 16); a = _mm_srai_epi32 (a, 16); b = _mm_slli_epi

[FFmpeg-devel] [PATCH 0/2] x86: hevc_mc: port to SSSE3

2014-08-23 Thread Christophe Gisquet
As far as I can see, the only reason those functions are SSE4 is because of the pextrw needed for the following block widths: - 2, used only by chroma; - 6, used by chroma and indirectly by luma; - 12, used by both. The better solution would be to convert all chroma handling to NV12, but it is vas