> On 1 Sep 2016, at 14:59, Timo Rothenpieler <t...@rothenpieler.org> wrote:
> 
> Am 01.09.2016 um 13:44 schrieb Ronald S. Bultje:
>> Hi Timo,
>> 
>> On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <t...@rothenpieler.org>
>> wrote:
>> 
>>>> Hi,
>>>> 
>>>> On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <aliki...@gmail.com> wrote:
>>>> 
>>>>> Hi Oliver,
>>>>> 
>>>>> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt
>>>>> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE
>>> HEVC
>>>>> Main 10 encoding.
>>>>> 
>>>>> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while
>>> FPS
>>>>> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps
>>> is
>>>>> like 75-76:
>>>> 
>>>> 
>>>> I think this is expected, the p010le conversion is C (no SIMD). The
>>>> yuv420p10le conversion is using x86 SIMD (probably AVX).
>>>> 
>>>> To fix this, add x86 SIMD implementations of the p010le conversions in
>>>> swscale. Better yet, add direct conversions from yuv420p10 (which I
>>> assume
>>>> is the internal format of your actual source after decoding?) to p010le,
>>>> first C and then later x86 SIMD.
>>> 
>>> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C
>>> implementation.
>>> 
>> 
>> I agree. I didn't mean to offend you for writing bad C code, or for not
>> writing SIMD code. I simply meant to point out that if you want to go from
>> 40-50fps to 100+fps, SIMD is probably the easiest way to move in that
>> direction.
> 
> Didn't take it like that, was more a general remark.
> The C implementation is as straight forward as it gets.
> I wonder if re-arranging the code, could make it more efficient though.
> Stuff like moving some if() checks out of the loop, and duplicating the
> loop instead, or other tricks that lead to gcc generating faster code.

I’m not sure it’ll make much difference - you may recall my original patch had 
code in nvenc.c that took a YUV420P input and converted it to P010 as it fed 
the frames into the encoder. Out of curiosity I did some quick testing of this 
versus the code that has since been added in swscale to support P010 
conversions and could find no difference in the time it took to encode my 60s 
sample. Not an exhaustive test by any means, but if there was any obvious 
inefficiency in the swscale code then I’d have expected to see some difference 
but I tested my sample three times with each version of the code and the time 
taken to encode was virtually identical every time.

Oliver

> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to