On Sun, Nov 26, 2017 at 07:07:41PM +0100, Martin Vignali wrote: > Hello, > > in attach patchs > > 0001-avcodec-huffyuvenc-increase-scalar-loop-count > and > 0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds > > like diff_bytes and diff_bytes16, have AVX2 version, increase the scalar > loop > to call the aligned version in most case > > > > 0002-avcodec-huffyuvenc-remove-code-duplication-in > remove some code duplication, for width < 32 and for the initial scalar loop > > > pass fate test for me (x86_64, mac os 10.12) > > Martin
> huffyuvenc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > 32eecc99e666808926e1dec4ff35c17a94f5f86e > 0001-avcodec-huffyuvenc-increase-scalar-loop-count.patch > From 9477be212247012ac386beeff009a2edb78abb31 Mon Sep 17 00:00:00 2001 > From: Martin Vignali <martin.vign...@gmail.com> > Date: Sun, 26 Nov 2017 19:01:29 +0100 > Subject: [PATCH 1/3] avcodec/huffyuvenc : increase scalar loop count > > in order to try to call dsp in aligned mode > (diff_int16 have AVX2 now) > --- > libavcodec/huffyuvenc.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/huffyuvenc.c b/libavcodec/huffyuvenc.c > index 89639b75df..4f3a28e033 100644 > --- a/libavcodec/huffyuvenc.c > +++ b/libavcodec/huffyuvenc.c > @@ -80,12 +80,12 @@ static inline int sub_left_prediction(HYuvContext *s, > uint8_t *dst, > } > return left; > } else { > - for (i = 0; i < 16; i++) { > + for (i = 0; i < 32; i++) { > const int temp = src16[i]; > dst16[i] = temp - left; > left = temp; > } > - s->hencdsp.diff_int16(dst16 + 16, src16 + 16, src16 + 15, s->n - > 1, w - 16); > + s->hencdsp.diff_int16(dst16 + 32, src16 + 32, src16 + 31, s->n - > 1, w - 32); > return src16[w-1]; > } > } > -- > 2.11.0 (Apple Git-81) > > huffyuvenc.c | 46 ++++++++++++++++------------------------------ > 1 file changed, 16 insertions(+), 30 deletions(-) > ba80747db2582141ec0faefc5ccd04fba65c7d72 > 0002-avcodec-huffyuvenc-remove-code-duplication-in.patch > From 7fa991ae72c97f4d1f74789e543cf01dcb93adb9 Mon Sep 17 00:00:00 2001 > From: Martin Vignali <martin.vign...@gmail.com> > Date: Sun, 26 Nov 2017 19:02:10 +0100 > Subject: [PATCH 2/3] avcodec/huffyuvenc : remove code duplication in > sub_left_prediction > > start of the line (before dsp call), can be merge with width < 32 part > --- > libavcodec/huffyuvenc.c | 46 ++++++++++++++++------------------------------ > 1 file changed, 16 insertions(+), 30 deletions(-) > > diff --git a/libavcodec/huffyuvenc.c b/libavcodec/huffyuvenc.c > index 4f3a28e033..59da49212e 100644 > --- a/libavcodec/huffyuvenc.c > +++ b/libavcodec/huffyuvenc.c > @@ -53,41 +53,27 @@ static inline int sub_left_prediction(HYuvContext *s, > uint8_t *dst, > { > int i; > if (s->bps <= 8) { > - if (w < 32) { > - for (i = 0; i < w; i++) { > - const int temp = src[i]; > - dst[i] = temp - left; > - left = temp; > - } > - return left; > - } else { > - for (i = 0; i < 32; i++) { > - const int temp = src[i]; > - dst[i] = temp - left; > - left = temp; > - } > - s->llvidencdsp.diff_bytes(dst + 32, src + 32, src + 31, w - 32); > - return src[w-1]; > + for (i = 0; i < FFMIN(w, 32); i++) { /* scalar loop before dsp call > */ > + const int temp = src[i]; > + dst[i] = temp - left; > + left = temp; requiring FFMIN() to be evaluated per iteration could be slower if the compiler fails to factor it out no other comments from me, the patches should be ok otherwise [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB If you fake or manipulate statistics in a paper in physics you will never get a job again. If you fake or manipulate statistics in a paper in medicin you will get a job for life at the pharma industry.
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel