filter_line is generally vectorized, wheras filter_edge is implemented in C. Currently we rely on filter_edge to process non-edges in cases where the width doesn't match the alignment. This causes us to process non-edge pixels with the slow C implementation vs the faster SSE implementation.
It is generally faster to process 8 pixels with the slowest SSE2 vectorized implementation than it is to process 2 pixels with the C implementation. Therefore, if filter_edge needs to process 2 or more non-edge pixels, it would be faster to process these non-edge pixels with filter_line instead even if it processes more pixels than necessary. To address this, we use filter_line so long as we know that at least 2 pixels will be used in the final output even if the rest of the computed pixels are invalid. Any incorrect output pixels generated by filter_line will be overwritten by the following call to filter_edge. In addtion we avoid running filter_line if it would read or write pixels outside the current slice. Signed-off-by: Chris Phlipot <cphlip...@gmail.com> --- libavfilter/vf_yadif.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index 54109566be..394c04a985 100644 --- a/libavfilter/vf_yadif.c +++ b/libavfilter/vf_yadif.c @@ -201,6 +201,8 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int slice_end = (td->h * (jobnr+1)) / nb_jobs; int y; int edge = 3 + s->req_align / df - 1; + int filter_width_target = td->w - 3; + int filter_width_rounded_up = (filter_width_target & ~(s->req_align-1)) + s->req_align; /* filtering reads 3 pixels to the left/right; to avoid invalid reads, * we need to call the c variant which avoids this for border pixels @@ -215,11 +217,28 @@ static int filter_slice(AVFilterContext *ctx, void *arg, int jobnr, int nb_jobs) int mrefs = y ? -refs : refs; int parity = td->parity ^ td->tff; int mode = y == 1 || y + 2 == td->h ? 2 : s->mode; + + /* Adjust width and alignment to process extra pixels in filter_line + * using potentially vectorized code so long as it doesn't cause + * reads or writes outside of the current slice. filter_edge will + * correct any incorrect pixels written by filter_line in this + * scenario. + */ + int filter_width; + int edge_alignment; + if (filter_width_rounded_up - filter_width_target >= 2 + && y*refs + filter_width_rounded_up < slice_end * refs + refs - 3) { + filter_width = filter_width_rounded_up; + edge_alignment = 1; + } else { + filter_width = td->w - edge; + edge_alignment = s->req_align; + } s->filter_line(dst + pix_3, prev + pix_3, cur + pix_3, - next + pix_3, td->w - edge, + next + pix_3, filter_width, prefs, mrefs, parity, mode); s->filter_edges(dst, prev, cur, next, td->w, - prefs, mrefs, parity, mode, s->req_align); + prefs, mrefs, parity, mode, edge_alignment); } else { memcpy(&td->frame->data[td->plane][y * td->frame->linesize[td->plane]], &s->cur->data[td->plane][y * refs], td->w * df); -- 2.25.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".