On Mon, 23 May 2016 14:30:54 +0200
Håvard Espeland wrote:
> Currently, we are also developing a version of the same encoder for
> Nvidia TX1 with Cuda/Neon SIMD for supporting multiple high quality
> streams in real time using ffmpeg. I guess there is little interest
> in merging this as well, bu
>> Have you test your optimizations in the other prores encoder (prores
> kostya) (who i think have more features (interlaced encoding and 444
> versions)) ?
No, we haven’t done this yet. The goal for us was performance without loosing
accuracy. We choose Anatoliy simply because it was faster.
I
Hi Paul,
> On 23 May 2016, at 13:13, Paul B Mahol wrote:
>
> On 5/23/16, Haavard Espeland wrote:
>> Hi guys,
>>
>> We have been working on Prores Anatoliy optimizations to get the speed up on
>> an embedded x86 platform. Fdct (10bit), scaling and encoding of code words
>> have been optimized w
2016-05-23 13:44 GMT+02:00 Håvard Espeland :
>
> > The SIMD won't be accepted if it's intrinsics. The codeword encoding is
> not
> > SIMD, is it? So that may be worth upstreaming.
>
> All optimizations we’ve done are SIMD so it does not apply. Basically what
> we do for codewords is to process the
> The SIMD won't be accepted if it's intrinsics. The codeword encoding is not
> SIMD, is it? So that may be worth upstreaming.
All optimizations we’ve done are SIMD so it does not apply. Basically what we
do for codewords is to process the shifting/masking for eight codewords at a
time. The put
On 5/23/16, Haavard Espeland wrote:
> Hi guys,
>
> We have been working on Prores Anatoliy optimizations to get the speed up on
> an embedded x86 platform. Fdct (10bit), scaling and encoding of code words
> have been optimized with AVX2 instructions, and the performance is increased
> by roughly 4
Hi Havard,
On Mon, May 23, 2016 at 6:36 AM, Håvard Espeland wrote:
> Hi guys,
>
> We have been working on Prores Anatoliy optimizations to get the speed up
> on an embedded x86 platform. Fdct (10bit), scaling and encoding of code
> words have been optimized with AVX2 instructions, and the perfor