> On 1 Sep 2016, at 14:44, Ronald S. Bultje <rsbul...@gmail.com> wrote: > > Hi Timo, > > On Thu, Sep 1, 2016 at 7:34 AM, Timo Rothenpieler <t...@rothenpieler.org> > wrote: > >>> Hi, >>> >>> On Thu, Sep 1, 2016 at 7:00 AM, Ali KIZIL <aliki...@gmail.com> wrote: >>> >>>> Hi Oliver, >>>> >>>> I just setup my DDR3 RAM speed to 2133 Mhz on i7 4960x server. It dosnt >>>> make a much difference. FPS is still waiving 41-44 fps for UHD P010LE >> HEVC >>>> Main 10 encoding. >>>> >>>> Also, rawvideo P010LE encodding waiving 39-42 fps. For your note;while >> FPS >>>> waves from 39-42 fps for YUV420P to P010LE, YUV420P to YUV420P10LE fps >> is >>>> like 75-76: >>> >>> >>> I think this is expected, the p010le conversion is C (no SIMD). The >>> yuv420p10le conversion is using x86 SIMD (probably AVX). >>> >>> To fix this, add x86 SIMD implementations of the p010le conversions in >>> swscale. Better yet, add direct conversions from yuv420p10 (which I >> assume >>> is the internal format of your actual source after decoding?) to p010le, >>> first C and then later x86 SIMD. >> >> I think 40-50 FPS is quite a nice result for UHD with the plain stupid C >> implementation. >> > > I agree. I didn't mean to offend you for writing bad C code, or for not > writing SIMD code. I simply meant to point out that if you want to go from > 40-50fps to 100+fps, SIMD is probably the easiest way to move in that > direction. > > Also, isn't the internal representation of YUV 10bit in swscale >> essentially yuv420p10 anyway, so the conversion already is as direct as >> it gets? > > > There is probably no conversion at all, right. But given that there's also > a video being decoded, which is much more CPU-intensive than colorspace > conversion, you wouldn't expect the colorspace conversion to slow it down > by >2x. (Unless it's C, of course. :-).) > >> I have no idea why you would want to convert from yuv420p to p010le or >>> yuv420p10le. I understand swscale supports it (it should) but I doubt >>> that's how you want to generate 10 bits content. >> >> P010 is the only YUV420 10bit format NVENC supports. > > > His source in the given example was yuv420p. If your source is 8bit, encode > 8bits, not 10bits. For 10bit encoding, use 10bit source. > > Right? >
When I did some tests of this a week or so ago I found that taking an 8-bit source, converting to 10-bit and encoding as 10-bit could actually save space. I posted my results to this list. I tried it after reading this... http://x264.nl/x264/10bit_02-ateme-why_does_10bit_save_bandwidth.pdf <http://x264.nl/x264/10bit_02-ateme-why_does_10bit_save_bandwidth.pdf> …and was curious to see if it applied to NVENC HEVC. I only tried one sample file, a yuv420p Slingbox capture but when I set global quality constant I saved a fair bit on the output file size. Interestingly (or not) I couldn’t reproduce anything similar using x265 using a similar approach. > So even if this is only a performance test, we need to think about whether > the test tells us something meaningful. In particular, to repeat what I > said earlier, if the source is represented as yuv420p10le after decoding, a > direct yuv420p10le to p010le conversion in C and SIMD is probably going to > be even-more-efficient than a SIMD implementation of the p010le (or be) > input/output that you wrote earlier, since that's the "slow" conversion > path. > > If this is confusing, poke me at VDD (QtCon) and I'll explain in more > detail. > > Ronald > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel