On Tue, Feb 27, 2018 at 9:35 PM, David Murmann <david.murm...@btf.de> wrote: > Quantization scaling seems to be a slight bottleneck, > this change allows the compiler to more easily vectorize > the loop. This improves total encoding performance in my > tests by about 10-20%. > > Signed-off-by: David Murmann <da...@btf.de> > --- > libavcodec/proresenc_anatoliy.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/libavcodec/proresenc_anatoliy.c > b/libavcodec/proresenc_anatoliy.c > index 0516066163..8b296f6f1b 100644 > --- a/libavcodec/proresenc_anatoliy.c > +++ b/libavcodec/proresenc_anatoliy.c > @@ -232,14 +232,18 @@ static const uint8_t lev_to_cb[10] = { 0x04, 0x0A, > 0x05, 0x06, 0x04, 0x28, > static void encode_ac_coeffs(AVCodecContext *avctx, PutBitContext *pb, > int16_t *in, int blocks_per_slice, int *qmat) > { > + int16_t block[64]; > int prev_run = 4; > int prev_level = 2; > int run = 0, level, code, i, j; > - for (i = 1; i < 64; i++) { > - int indp = progressive_scan[i]; > - for (j = 0; j < blocks_per_slice; j++) { > - int val = QSCALE(qmat, indp, in[(j << 6) + indp]); > + for (j = 0; j < blocks_per_slice; j++) { > + for (i = 0; i < 64; i++) { > + block[i] = (float)in[(j << 6) + i] / (float)qmat[i]; > + } > + > + for (i = 1; i < 64; i++) { > + int val = block[progressive_scan[i]]; > if (val) { > encode_codeword(pb, run, run_to_cb[FFMIN(prev_run, 15)]);
Usually, using float is best avoided. Did you test re-factoring the loop structure without changing it to float? - Hendrik _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel