[FFmpeg-devel] [PATCH] dnn_backend_native_layer_conv2d.c: fix bug of loop boundary in single thread mode.

2020-09-19 Thread xujunzz
From: Xu Jun Before patch, fate test for dnn may fail in some Windows environment while succeed in my Linux. The bug was caused by a wrong loop boundary. Win10 and Linux seems to have different explaination for that.After patch, fate test succeed in my windows mingw 64-bit. Signed-off-by: Xu Jun

[FFmpeg-devel] [PATCH v3 2/2] dnn_backend_native_layer_conv2d.c: refine code.

2020-09-16 Thread xujunzz
From: Xu Jun Move thread area allocate out of thread function into main thread. Signed-off-by: Xu Jun --- .../dnn/dnn_backend_native_layer_conv2d.c | 30 +-- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_native_layer_conv2d.c b

[FFmpeg-devel] [PATCH v3 1/2] dnn_backend_native_layer_conv2d.c: fix memory allocation bug in multithread function.

2020-09-16 Thread xujunzz
From: Xu Jun Before patch, memory was allocated in each thread functions, which may cause more than one time of memory allocation and cause crash. After patch, memory is allocated in the main thread once, an index was parsed into thread functions. Bug fixed. Signed-off-by: Xu Jun --- v3: fix b

[FFmpeg-devel] [PATCH v2 2/2] dnn_backend_native_layer_conv2d.c: refine code.

2020-09-15 Thread xujunzz
From: Xu Jun Move thread area allocate out of thread function into main thread. Signed-off-by: Xu Jun --- v2: fix build warnings .../dnn/dnn_backend_native_layer_conv2d.c | 44 +-- 1 file changed, 20 insertions(+), 24 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_

[FFmpeg-devel] [PATCH v2 1/2] dnn_backend_native_layer_conv2d.c: fix memory allocation bug in multithread function.

2020-09-15 Thread xujunzz
From: Xu Jun Before patch, memory was allocated in each thread functions, which may cause more than one time of memory allocation and cause crash. After patch, memory is allocated in the main thread once, an index was parsed into thread functions. Bug fixed. Signed-off-by: Xu Jun --- .../dnn/

[FFmpeg-devel] [PATCH 2/2] dnn_backend_native_layer_conv2d.c: refine code.

2020-09-14 Thread xujunzz
From: Xu Jun Move thread area allocate out of thread function into main thread. Signed-off-by: Xu Jun --- .../dnn/dnn_backend_native_layer_conv2d.c | 29 +-- 1 file changed, 13 insertions(+), 16 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_native_layer_conv2d.c b

[FFmpeg-devel] [PATCH 1/2] dnn_backend_native_layer_conv2d.c: fix memory allocation bug in multithread function.

2020-09-14 Thread xujunzz
From: Xu Jun Before patch, memory was allocated in each thread functions, which may cause more than one time of memory allocation and cause crash. After patch, memory is allocated in the main thread once, an index was parsed into thread functions. Bug fixed. Signed-off-by: Xu Jun --- .../dnn/

[FFmpeg-devel] [PATCH v5 2/2] dnn_backend_native_layer_conv2d.c:Add mutithread function

2020-09-06 Thread xujunzz
From: Xu Jun Use pthread to multithread dnn_execute_layer_conv2d. Can be tested with command "./ffmpeg_g -i input.png -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y:options=conv2d_threads=23 \ -y sr_native.jpg -benchmark" before patch: utime=11.238

[FFmpeg-devel] [PATCH v5 1/2] dnn_backend_native.c: parse options in native backend

2020-09-06 Thread xujunzz
From: Xu Jun Signed-off-by: Xu Jun --- v2: use av_opt_set_from_string instead of function dnn_parse_option(). v3: make all the options supported, not just conv2d_threads v4: move dnn_native_options and dnn_native_class to from .h to .c. libavfilter/dnn/dnn_backend_native.c | 22 +++

[FFmpeg-devel] [PATCH v4 2/2] dnn_backend_native_layer_conv2d.c:Add mutithread function

2020-09-04 Thread xujunzz
From: Xu Jun Use pthread to multithread dnn_execute_layer_conv2d. Can be tested with command "./ffmpeg_g -i input.png -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y:options=conv2d_threads=23 \ -y sr_native.jpg -benchmark" before patch: utime=11.238

[FFmpeg-devel] [PATCH v4 1/2] dnn_backend_native.c: parse options in native backend

2020-09-04 Thread xujunzz
From: Xu Jun Signed-off-by: Xu Jun --- v2: use av_opt_set_from_string instead of function dnn_parse_option(). v3: make all the options supported, not just conv2d_threads v4: move dnn_native_options and dnn_native_class to from .h to .c. libavfilter/dnn/dnn_backend_native.c | 22 +++

[FFmpeg-devel] [PATCH v3 1/2] dnn_backend_native.c: parse options in native backend

2020-09-04 Thread xujunzz
From: Xu Jun Signed-off-by: Xu Jun --- v2: use av_opt_set_from_string instead of function dnn_parse_option(). v3: make all the options supported, not just conv2d_threads libavfilter/dnn/dnn_backend_native.c | 19 ++- libavfilter/dnn/dnn_backend_native.h | 21 +++

[FFmpeg-devel] [PATCH v2 1/2] dnn_backend_native.c: parse options in native backend

2020-09-04 Thread xujunzz
From: Xu Jun v2: use av_opt_set_from_string instead of function dnn_parse_option(). Signed-off-by: Xu Jun --- libavfilter/dnn/dnn_backend_native.c | 19 ++- libavfilter/dnn/dnn_backend_native.h | 21 + 2 files changed, 31 insertions(+), 9 deletions(-) diff

[FFmpeg-devel] [PATCH v2 2/2] dnn_backend_native_layer_conv2d.c:Add mutithread function

2020-09-04 Thread xujunzz
From: Xu Jun v2: add check for HAVE_PTHREAD_CANCEL and modify FATE test dnn-layer-conv2d-test.c Use pthread to multithread dnn_execute_layer_conv2d. Can be tested with command "./ffmpeg_g -i input.png -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y:o

[FFmpeg-devel] [PATCH 1/2] dnn_backend_native.c: parse options in native backend

2020-09-03 Thread xujunzz
From: Xu Jun Signed-off-by: Xu Jun --- libavfilter/dnn/dnn_backend_native.c | 22 -- libavfilter/dnn/dnn_backend_native.h | 13 + 2 files changed, 33 insertions(+), 2 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_native.c b/libavfilter/dnn/dnn_backend_n

[FFmpeg-devel] [PATCH 2/2] Add mutithread function for dnn_backend_native_layer_conv2d.c

2020-09-03 Thread xujunzz
From: Xu Jun Use pthread to multithread dnn_execute_layer_conv2d. Can be tested with command "./ffmpeg_g -i input.png -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y:options=conv2d_threads=23 \ -y sr_native.jpg -benchmark" before patch: utime=11.238

[FFmpeg-devel] [PATCH 3/3][GSoC] Add x86-avx2 optimization for dnn_execute_layer_conv2d

2020-08-31 Thread xujunzz
From: Xu Jun Can be tested with command "./ffmpeg_g -i test_1s.mp4 -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y -y sr_native.mp4 -benchmark" before patch: utime=826.044s stime=0.550s rtime=39.680s after patch: utime=545.137s stime=0.467s rtime=27

[FFmpeg-devel] [PATCH 2/3][GSoC] Add x86-sse4 optimization for dnn_execute_layer_conv2d

2020-08-31 Thread xujunzz
From: Xu Jun Can be tested with command "./ffmpeg_g -i input.png -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y -y sr_native.jpg -benchmark"\ -cpuflags 0x100 before patch: utime=20.817s stime=0.047s rtime=1.051s after patch: utime=3.744s stime=0.03

[FFmpeg-devel] [PATCH 1/3][GSoC] Add mutithread function for dnn_backend_native_layer_conv2d.c

2020-08-31 Thread xujunzz
From: Xu Jun Use pthread to multithread dnn_execute_layer_conv2d. Can be tested with command "./ffmpeg_g -i input.png -vf \ format=yuvj420p,dnn_processing=dnn_backend=native:model= \ espcn.model:input=x:output=y -y sr_native.jpg -benchmark" before patch: utime=11.238s stime=0.005s rtime=11.248s

[FFmpeg-devel] [PATCH v2 3/3] avfilter/vf_convolution: Add X86 SIMD optimizations for filter_column()

2019-12-22 Thread xujunzz
From: Xu Jun Performance improves about 10% compared to v1. Tested using this command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vfra

[FFmpeg-devel] [PATCH v2 1/3] avfilter/vf_convolution: add 16-column operation for filter_column() and modify filter_slice().

2019-12-22 Thread xujunzz
From: chen Replace the existing C code for filter_column() with chen's code. Modify filter_slice() to be compatible with this change. Tested using the command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6

[FFmpeg-devel] [PATCH v2 2/3] avfilter/vf_convolution: Add x86 SIMD optimizations for filter_row()

2019-12-22 Thread xujunzz
From: Xu Jun Read 16 elements from memory, shuffle and parallally compute 4 rows at a time, shuffle and parallelly write 16 results to memory. Performance improves about 15% compared to v1. Tested using this command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5

[FFmpeg-devel] [PATCH 2/3] avfilter/vf_convolution: Add x86 SIMD optimizations for filter_row()

2019-12-02 Thread xujunzz
From: Xu Jun Tested using this command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45:1/45:1:2:3:4:row:row:row:row" -an -vframes 5000 -f null /dev/null -benchmark after patch: frame=

[FFmpeg-devel] [PATCH 1/3] avfilter/vf_convolution: add 16-column operation for filter_column() and modify filter_slice().

2019-12-02 Thread xujunzz
From: chen Replace the existing C code for filter_column() with chen's code. Modify filter_slice() to be compatible with this change. Tested using the command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6

[FFmpeg-devel] [PATCH 3/3] avfilter/vf_convolution: add X86 SIMD for filter_column()

2019-12-02 Thread xujunzz
From: Xu Jun Tested using this command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vframes 5000 -f null /dev/null -benchmark after pa

[FFmpeg-devel] [PATCH] avfilter/vf_convolution: add x86 SIMD for filter_column()

2019-11-27 Thread xujunzz
From: Xu Jun Tested using a simple command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vframes 1000 -f null /dev/null The fps increas

[FFmpeg-devel] [PATCH] avfilter/vf_convolution: add 16-column operation for filter_column() to prepare for x86 SIMD.

2019-11-27 Thread xujunzz
From: Xu Jun In order to add x86 SIMD for filter_column(), I write a C function which processes 16 columns at a time. Signed-off-by: Xu Jun --- libavfilter/vf_convolution.c | 56 +++ libavfilter/x86/vf_convolution_init.c | 23 +++ 2 files changed, 79 i

[FFmpeg-devel] [PATCH] avfilter/vf_convolution:Add x86 SIMD optimizations for filter_row()

2019-11-27 Thread xujunzz
From: Xu Jun Tested using the following command: ./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5\ 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45\ :1/45:1:2:3:4:row:row:row:row" -an -vframes 1000 -f null /dev/null The fps increases fro