On Thu, Sep 19, 2019 at 12:34 PM Michael Niedermayer <mich...@niedermayer.cc> wrote:
> On Wed, Sep 11, 2019 at 12:29:57PM -0700, Baptiste Coudurier wrote: > > --- > > libavcodec/dv.h | 1 + > > libavcodec/dvenc.c | 576 ++++++++++++++++++++++++++++++++++++++++----- > > 2 files changed, 522 insertions(+), 55 deletions(-) > > a fate test should be added for this if its not already planed or done > I'm having issues with fate on macOS catalina right now :( > [...] > > > + /* LOOP1: weigh AC components and store to save[] */ > > + /* (i=0 is the DC component; we only include it to make the > > + number of loop iterations even, for future possible SIMD > optimization) */ > > + for (i = 0; i < 64; i += 2) { > > + int level0, level1; > > + > > + /* get the AC component (in zig-zag order) */ > > + level0 = blk[zigzag_scan[i+0]]; > > + level1 = blk[zigzag_scan[i+1]]; > > + > > + /* extract sign and make it the lowest bit */ > > + bi->sign[i+0] = (level0>>31)&1; > > + bi->sign[i+1] = (level1>>31)&1; > > + > > + /* take absolute value of the level */ > > + level0 = FFABS(level0); > > + level1 = FFABS(level1); > > + > > + /* weigh it */ > > + level0 = (level0*weight[i+0] + 4096 + (1<<17)) >> 18; > > + level1 = (level1*weight[i+1] + 4096 + (1<<17)) >> 18; > > + > > + /* save unquantized value */ > > + bi->save[i+0] = level0; > > + bi->save[i+1] = level1; > > + } > > + > > + /* find max component */ > > + for (i = 0; i < 64; i++) { > > + int ac = bi->save[i]; > > + if (ac > max) > > + max = ac; > > + } > > these 2 loops can be merged avoiding a 2nd pass > Merged [...] > > +static inline void dv_guess_qnos_hd(EncBlockInfo *blks, int *qnos) > > +{ > > + EncBlockInfo *b; > > + int min_qlevel[5]; > > + int qlevels[5]; > > + int size[5]; > > + int i, j; > > + /* cache block sizes at hypothetical qlevels */ > > + uint16_t size_cache[5*8][DV100_NUM_QLEVELS] = {{0}}; > > + > > + /* get minimum qlevels */ > > + for (i = 0; i < 5; i++) { > > + min_qlevel[i] = 1; > > + for (j = 0; j < 8; j++) { > > + if (blks[8*i+j].min_qlevel > min_qlevel[i]) > > + min_qlevel[i] = blks[8*i+j].min_qlevel; > > + } > > + } > > + > > + /* initialize sizes */ > > + for (i = 0; i < 5; i++) { > > + qlevels[i] = dv100_starting_qno; > > + if (qlevels[i] < min_qlevel[i]) > > + qlevels[i] = min_qlevel[i]; > > + > > + qnos[i] = DV100_QLEVEL_QNO(dv100_qlevels[qlevels[i]]); > > + size[i] = 0; > > + for (j = 0; j < 8; j++) { > > + size_cache[8*i+j][qlevels[i]] = > dv100_actual_quantize(&blks[8*i+j], qlevels[i]); > > + size[i] += size_cache[8*i+j][qlevels[i]]; > > + } > > + } > > + > > + /* must we go coarser? */ > > + if (size[0]+size[1]+size[2]+size[3]+size[4] > vs_total_ac_bits_hd) { > > + int largest = size[0] % 5; /* 'random' number */ > > + > > > + do { > > + /* find the macroblock with the lowest qlevel */ > > + for (i = 0; i < 5; i++) { > > + if (qlevels[i] < DV100_NUM_QLEVELS-1 && > > + qlevels[i] < qlevels[largest]) > > + largest = i; > > + } > > + > > + i = largest; > > + /* ensure that we don't enter infinite loop */ > > + largest = (largest+1) % 5; > > + > > + if (qlevels[i] >= DV100_NUM_QLEVELS-1) { > > + /* can't quantize any more */ > > + continue; > > + } > > + > > + /* quantize a little bit more */ > > + qlevels[i] += dv100_qlevel_inc; > > + if (qlevels[i] > DV100_NUM_QLEVELS-1) > > + qlevels[i] = DV100_NUM_QLEVELS-1; > > + > > + qnos[i] = DV100_QLEVEL_QNO(dv100_qlevels[qlevels[i]]); > > + size[i] = 0; > > + > > + /* for each block */ > > + b = &blks[8*i]; > > + for (j = 0; j < 8; j++, b++) { > > + /* accumulate block size into macroblock */ > > + if(size_cache[8*i+j][qlevels[i]] == 0) { > > + /* it is safe to use actual_quantize() here because > we only go from finer to coarser, > > + and it saves the final actual_quantize() down > below */ > > + size_cache[8*i+j][qlevels[i]] = > dv100_actual_quantize(b, qlevels[i]); > > + } > > + size[i] += size_cache[8*i+j][qlevels[i]]; > > + } /* for each block */ > > + > > + } while (vs_total_ac_bits_hd < size[0] + size[1] + size[2] + > size[3] + size[4] && > > + (qlevels[0] < DV100_NUM_QLEVELS-1 || > > + qlevels[1] < DV100_NUM_QLEVELS-1 || > > + qlevels[2] < DV100_NUM_QLEVELS-1 || > > + qlevels[3] < DV100_NUM_QLEVELS-1 || > > + qlevels[4] < DV100_NUM_QLEVELS-1)); > > i think the DV100_NUM_QLEVELS checks can be simplified > > If we keep track of how many qlevels are < DV100_NUM_QLEVELS-1 > The check in the first loop is then not needed because if > there is one that is smaller than that will be found and > no need to check each against DV100_NUM_QLEVELS-1 > > The smallest then being checked again against DV100_NUM_QLEVELS-1 also > becomes unneeded > > and at the end the 5 checks in the while() can then be changed to a > single check on the new variable > > This should make the code both faster and simpler > Updated, please check :) Patch updated -- Baptiste _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".