On Mon, 23 May 2016 14:30:54 +0200
Håvard Espeland wrote:
> Currently, we are also developing a version of the same encoder for
> Nvidia TX1 with Cuda/Neon SIMD for supporting multiple high quality
> streams in real time using ffmpeg. I guess there is little interest
> in merging this as well, bu
>> Have you test your optimizations in the other prores encoder (prores
> kostya) (who i think have more features (interlaced encoding and 444
> versions)) ?
No, we haven’t done this yet. The goal for us was performance without loosing
accuracy. We choose Anatoliy simply because it was faster.
I
Hi Paul,
> On 23 May 2016, at 13:13, Paul B Mahol wrote:
>
> On 5/23/16, Haavard Espeland wrote:
>> Hi guys,
>>
>> We have been working on Prores Anatoliy optimizations to get the speed up on
>> an embedded x86 platform. Fdct (10bit), scaling and encoding of code words
>> have been optimized w
2016-05-23 13:44 GMT+02:00 Håvard Espeland :
>
> > The SIMD won't be accepted if it's intrinsics. The codeword encoding is
> not
> > SIMD, is it? So that may be worth upstreaming.
>
> All optimizations we’ve done are SIMD so it does not apply. Basically what
> we do for codewords is to process the
> The SIMD won't be accepted if it's intrinsics. The codeword encoding is not
> SIMD, is it? So that may be worth upstreaming.
All optimizations we’ve done are SIMD so it does not apply. Basically what we
do for codewords is to process the shifting/masking for eight codewords at a
time. The put
On 5/23/16, Haavard Espeland wrote:
> Hi guys,
>
> We have been working on Prores Anatoliy optimizations to get the speed up on
> an embedded x86 platform. Fdct (10bit), scaling and encoding of code words
> have been optimized with AVX2 instructions, and the performance is increased
> by roughly 4
Hi Havard,
On Mon, May 23, 2016 at 6:36 AM, Håvard Espeland wrote:
> Hi guys,
>
> We have been working on Prores Anatoliy optimizations to get the speed up
> on an embedded x86 platform. Fdct (10bit), scaling and encoding of code
> words have been optimized with AVX2 instructions, and the perfor
Hi guys,
We have been working on Prores Anatoliy optimizations to get the speed up on an
embedded x86 platform. Fdct (10bit), scaling and encoding of code words have
been optimized with AVX2 instructions, and the performance is increased by
roughly 45% for the standard profile Prores 4:2:2 on o