On 20/02/2019 10:17, Wang, Shaofei wrote:
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of
>> Mark Thompson
>> Sent: Saturday, February 16, 2019 8:12 PM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH v5] Improved the performance of 1
>> decode + N filter graphs and adaptive bitrate.
>> On 15/02/2019 21:54, Shaofei Wang wrote:
>>> It enabled multiple filter graph concurrency, which bring above about
>>> 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
>>>
>>> Below are some test cases and comparison as reference.
>>> (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
>>> (Software: Intel iHD driver - 16.9.00100, CentOS 7)
>>>
>>> For 1:N transcode by GPU acceleration with vaapi:
>>> ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
>>>     -hwaccel_output_format vaapi \
>>>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>>>     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
>>>     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null
>>>
>>>     test results:
>>>                 2 encoders 5 encoders 10 encoders
>>>     Improved       6.1%    6.9%       5.5%
>>>
>>> For 1:N transcode by GPU acceleration with QSV:
>>> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>>>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>>>     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null
>> \
>>>     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null
>>> /dev/null
>>>
>>>     test results:
>>>                 2 encoders  5 encoders 10 encoders
>>>     Improved       6%       4%         15%
>>>
>>> For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
>>> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>>>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>>>     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f
>> null /dev/null \
>>>     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f
>>> null /dev/null
>>>
>>>     test results:
>>>                 2 scale  5 scale   10 scale
>>>     Improved       12%     21%        21%
>>>
>>> For CPU only 1 decode to N scaling:
>>> ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>>>     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
>>>     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null
>>>
>>>     test results:
>>>                 2 scale  5 scale   10 scale
>>>     Improved       25%    107%       148%
>>>
>>> Signed-off-by: Wang, Shaofei <shaofei.w...@intel.com>
>>> Reviewed-by: Zhao, Jun <jun.z...@intel.com>
>>> ---
>>>  fftools/ffmpeg.c        | 121
>> ++++++++++++++++++++++++++++++++++++++++++++----
>>>  fftools/ffmpeg.h        |  14 ++++++
>>>  fftools/ffmpeg_filter.c |   1 +
>>>  3 files changed, 128 insertions(+), 8 deletions(-)
>>
>> On a bit more review, I don't think this patch works at all.
>>
> It has been tested and verified by a lot of cases. More fate cases need to be 
> covered now.
> 
>> The existing code is all written to be run serially.  This simplistic 
>> approach to
>> parallelising it falls down because many of those functions use variables
>> written in what were previously other functions called at different times but
>> have now become other threads, introducing undefined behaviour due to
>> data races.
>>
> Actually, this is not a patch to parallel every thing in the ffmpeg. It just 
> thread the input filter
> of the filter graph(tend for simple filter graph), which is a simple way to 
> improve N filter graph
> performance and also without introduce huge modification. So that there is 
> still a lot of serial function call, differences
> are that each filter graph need to init its output stream instead of init all 
> together and each
> filter graph will reap filters for its filter chain.

Indeed the existing encapsulation tries to keep things mostly separate, but in 
various places it accesses shared state which works fine in the serial case but 
fails when those parts are run in parallel.

Data races are undefined behaviour in C; introducing them is not acceptable.

>> To consider a single example (not the only one), the function
>> check_init_output_file() does not work at all after this change.  The test 
>> for
>> OutputStream initialisation (so that you run exactly once after all of the
>> output streams are ready) races with other threads setting those variables.
>> Since that's undefined behaviour you may get lucky sometimes and have the
>> output file initialisation run exactly once, but in general it will fail in 
>> unknown
>> ways.
>>
> 
> The check_init_output_file() should be responsible for the output file 
> related with
> specified output stream which managed by each thread chain, that means even it
> called by different thread, the data setting are different. Let me double 
> check.

Each output file can contain multiple streams - try a transcode with both audio 
and video filters.

(Incidentally, the patch as-is also crashes in this case if the transcode 
completes without any data from one of the streams.)

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to