From: Xu Jun
Before patch, fate test for dnn may fail in some Windows environment
while succeed in my Linux. The bug was caused by a wrong loop boundary.
Win10 and Linux seems to have different explaination for that.After
patch, fate test succeed in my windows mingw 64-bit.
Signed-off-by: Xu Jun
From: Xu Jun
Move thread area allocate out of thread function into
main thread.
Signed-off-by: Xu Jun
---
.../dnn/dnn_backend_native_layer_conv2d.c | 30 +--
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/libavfilter/dnn/dnn_backend_native_layer_conv2d.c
b
From: Xu Jun
Before patch, memory was allocated in each thread functions,
which may cause more than one time of memory allocation and
cause crash.
After patch, memory is allocated in the main thread once,
an index was parsed into thread functions. Bug fixed.
Signed-off-by: Xu Jun
---
v3: fix b
From: Xu Jun
Move thread area allocate out of thread function into
main thread.
Signed-off-by: Xu Jun
---
v2: fix build warnings
.../dnn/dnn_backend_native_layer_conv2d.c | 44 +--
1 file changed, 20 insertions(+), 24 deletions(-)
diff --git a/libavfilter/dnn/dnn_backend_
From: Xu Jun
Before patch, memory was allocated in each thread functions,
which may cause more than one time of memory allocation and
cause crash.
After patch, memory is allocated in the main thread once,
an index was parsed into thread functions. Bug fixed.
Signed-off-by: Xu Jun
---
.../dnn/
From: Xu Jun
Move thread area allocate out of thread function into
main thread.
Signed-off-by: Xu Jun
---
.../dnn/dnn_backend_native_layer_conv2d.c | 29 +--
1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/libavfilter/dnn/dnn_backend_native_layer_conv2d.c
b
From: Xu Jun
Before patch, memory was allocated in each thread functions,
which may cause more than one time of memory allocation and
cause crash.
After patch, memory is allocated in the main thread once,
an index was parsed into thread functions. Bug fixed.
Signed-off-by: Xu Jun
---
.../dnn/
From: Xu Jun
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:options=conv2d_threads=23 \
-y sr_native.jpg -benchmark"
before patch: utime=11.238
From: Xu Jun
Signed-off-by: Xu Jun
---
v2: use av_opt_set_from_string instead of function dnn_parse_option().
v3: make all the options supported, not just conv2d_threads
v4: move dnn_native_options and dnn_native_class to from .h to .c.
libavfilter/dnn/dnn_backend_native.c | 22 +++
From: Xu Jun
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:options=conv2d_threads=23 \
-y sr_native.jpg -benchmark"
before patch: utime=11.238
From: Xu Jun
Signed-off-by: Xu Jun
---
v2: use av_opt_set_from_string instead of function dnn_parse_option().
v3: make all the options supported, not just conv2d_threads
v4: move dnn_native_options and dnn_native_class to from .h to .c.
libavfilter/dnn/dnn_backend_native.c | 22 +++
From: Xu Jun
Signed-off-by: Xu Jun
---
v2: use av_opt_set_from_string instead of function dnn_parse_option().
v3: make all the options supported, not just conv2d_threads
libavfilter/dnn/dnn_backend_native.c | 19 ++-
libavfilter/dnn/dnn_backend_native.h | 21 +++
From: Xu Jun
v2: use av_opt_set_from_string instead of function dnn_parse_option().
Signed-off-by: Xu Jun
---
libavfilter/dnn/dnn_backend_native.c | 19 ++-
libavfilter/dnn/dnn_backend_native.h | 21 +
2 files changed, 31 insertions(+), 9 deletions(-)
diff
From: Xu Jun
v2: add check for HAVE_PTHREAD_CANCEL and modify FATE test
dnn-layer-conv2d-test.c
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:o
From: Xu Jun
Signed-off-by: Xu Jun
---
libavfilter/dnn/dnn_backend_native.c | 22 --
libavfilter/dnn/dnn_backend_native.h | 13 +
2 files changed, 33 insertions(+), 2 deletions(-)
diff --git a/libavfilter/dnn/dnn_backend_native.c
b/libavfilter/dnn/dnn_backend_n
From: Xu Jun
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y:options=conv2d_threads=23 \
-y sr_native.jpg -benchmark"
before patch: utime=11.238
From: Xu Jun
Can be tested with command "./ffmpeg_g -i test_1s.mp4 -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y -y sr_native.mp4 -benchmark"
before patch: utime=826.044s stime=0.550s rtime=39.680s
after patch: utime=545.137s stime=0.467s rtime=27
From: Xu Jun
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y -y sr_native.jpg -benchmark"\
-cpuflags 0x100
before patch: utime=20.817s stime=0.047s rtime=1.051s
after patch: utime=3.744s stime=0.03
From: Xu Jun
Use pthread to multithread dnn_execute_layer_conv2d.
Can be tested with command "./ffmpeg_g -i input.png -vf \
format=yuvj420p,dnn_processing=dnn_backend=native:model= \
espcn.model:input=x:output=y -y sr_native.jpg -benchmark"
before patch: utime=11.238s stime=0.005s rtime=11.248s
From: Xu Jun
Performance improves about 10% compared to v1.
Tested using this command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8
9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vfra
From: chen
Replace the existing C code for filter_column() with chen's code. Modify
filter_slice() to be compatible with this change.
Tested using the command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6
From: Xu Jun
Read 16 elements from memory, shuffle and parallally compute 4 rows at a time,
shuffle and parallelly write 16 results to memory.
Performance improves about 15% compared to v1.
Tested using this command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
From: Xu Jun
Tested using this command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8
9:1/45:1/45:1/45:1/45:1:2:3:4:row:row:row:row" -an -vframes 5000 -f null
/dev/null -benchmark
after patch:
frame=
From: chen
Replace the existing C code for filter_column() with chen's code. Modify
filter_slice() to be compatible with this change.
Tested using the command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6
From: Xu Jun
Tested using this command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8
9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vframes 5000 -f
null /dev/null -benchmark
after pa
From: Xu Jun
Tested using a simple command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8
9:1/45:1/45:1/45:1/45:1:2:3:4:column:column:column:column" -an -vframes 1000 -f
null /dev/null
The fps increas
From: Xu Jun
In order to add x86 SIMD for filter_column(), I write a C function which
processes 16 columns at a time.
Signed-off-by: Xu Jun
---
libavfilter/vf_convolution.c | 56 +++
libavfilter/x86/vf_convolution_init.c | 23 +++
2 files changed, 79 i
From: Xu Jun
Tested using the following command:
./ffmpeg_g -s 1280*720 -pix_fmt yuv420p -i test.yuv -vf convolution="1 2 3 4 5\
6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1 2 3 4 5 6 7 8 9:1/45:1/45:1/45\
:1/45:1:2:3:4:row:row:row:row" -an -vframes 1000 -f null /dev/null
The fps increases fro
28 matches
Mail list logo