On 15/02/2019 21:54, Shaofei Wang wrote:
> It enabled multiple filter graph concurrency, which bring above about
> 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> 
> Below are some test cases and comparison as reference.
> (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> 
> For 1:N transcode by GPU acceleration with vaapi:
> ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
>     -hwaccel_output_format vaapi \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
>     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null
> 
>     test results:
>                 2 encoders 5 encoders 10 encoders
>     Improved       6.1%    6.9%       5.5%
> 
> For 1:N transcode by GPU acceleration with QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null \
>     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null /dev/null
> 
>     test results:
>                 2 encoders  5 encoders 10 encoders
>     Improved       6%       4%         15%
> 
> For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> ./ffmpeg -hwaccel qsv -c:v h264_qsv \
>     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f null 
> /dev/null \
>     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f null 
> /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       12%     21%        21%
> 
> For CPU only 1 decode to N scaling:
> ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
>     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
>     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null
> 
>     test results:
>                 2 scale  5 scale   10 scale
>     Improved       25%    107%       148%
> 
> Signed-off-by: Wang, Shaofei <shaofei.w...@intel.com>
> Reviewed-by: Zhao, Jun <jun.z...@intel.com>
> ---
>  fftools/ffmpeg.c        | 121 
> ++++++++++++++++++++++++++++++++++++++++++++----
>  fftools/ffmpeg.h        |  14 ++++++
>  fftools/ffmpeg_filter.c |   1 +
>  3 files changed, 128 insertions(+), 8 deletions(-)

On a bit more review, I don't think this patch works at all.

The existing code is all written to be run serially.  This simplistic approach 
to parallelising it falls down because many of those functions use variables 
written in what were previously other functions called at different times but 
have now become other threads, introducing undefined behaviour due to data 
races.

To consider a single example (not the only one), the function 
check_init_output_file() does not work at all after this change.  The test for 
OutputStream initialisation (so that you run exactly once after all of the 
output streams are ready) races with other threads setting those variables.  
Since that's undefined behaviour you may get lucky sometimes and have the 
output file initialisation run exactly once, but in general it will fail in 
unknown ways.

If you want to resubmit this patch, you will need to refactor a lot of the 
other code in ffmpeg.c to rule out these undefined cases.

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to