On 23/08/14 12:15 PM, Christophe Gisquet wrote:
> Hi,
>
> 2014-08-23 17:01 GMT+02:00 James Almer :
There's a PACK macro in lavfi/x86/yasm-16.asm that does this without
>>> intrinsics.
>>>
>>> You meant yadif-16, right?
>>>
>>> Timothy
>>
>> Oops, yes i meant that :P
>
> I expect it to be nee
Hi,
2014-08-23 17:01 GMT+02:00 James Almer :
>>> There's a PACK macro in lavfi/x86/yasm-16.asm that does this without
>> intrinsics.
>>
>> You meant yadif-16, right?
>>
>> Timothy
>
> Oops, yes i meant that :P
I expect it to be needed for the weighted pred functions, so I'll
split it from yadif-1
On 23/08/14 11:55 AM, Timothy Gu wrote:
> On Aug 23, 2014 7:47 AM, "James Almer" wrote:
>>
>> On 23/08/14 11:07 AM, Mickaël Raulet wrote:
>>> For 10bits and 12bits, they should stay sse4 as well because of
> packusdw. You need some instructions to convert it to ssse3 see below
>>>
>>>
>>> static a
On Aug 23, 2014 7:47 AM, "James Almer" wrote:
>
> On 23/08/14 11:07 AM, Mickaël Raulet wrote:
> > For 10bits and 12bits, they should stay sse4 as well because of
packusdw. You need some instructions to convert it to ssse3 see below
> >
> >
> > static av_always_inline __m128i _MM_PACKUS_EPI32( __m1
On 23/08/14 11:07 AM, Mickaël Raulet wrote:
> For 10bits and 12bits, they should stay sse4 as well because of packusdw. You
> need some instructions to convert it to ssse3 see below
>
>
> static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
> {
> a = _mm_slli_epi32 (a, 1
For 10bits and 12bits, they should stay sse4 as well because of packusdw. You
need some instructions to convert it to ssse3 see below
static av_always_inline __m128i _MM_PACKUS_EPI32( __m128i a, __m128i b )
{
a = _mm_slli_epi32 (a, 16);
a = _mm_srai_epi32 (a, 16);
b = _mm_slli_epi
As far as I can see, the only reason those functions are SSE4 is because
of the pextrw needed for the following block widths:
- 2, used only by chroma;
- 6, used by chroma and indirectly by luma;
- 12, used by both.
The better solution would be to convert all chroma handling to NV12, but
it is vas