Replace two bit handling loops and internal conditional branch with simple formula using few logical operations.
The old function would generate wrong output if the input does not fit into 15 bits. Fix this by using 64 bit math and put_bits64(). This case should be quite rare, since the bug has not asserted itself. --- It's attempt for speed optimization, but in the process it turned out it needs also bugfixing. I only tested the old case of the code, to confirm i've implemented the correct function. Haven't done any benchmarks or run fate. It should be faster, especially because currently coefficients bellow 2048 are written using lookup table and bypass this function. If you like it, use it. Best Regards Ivan Kalvachev.
From 1f7fd38fcb6c64281bc458c09c711fc567b3ef0f Mon Sep 17 00:00:00 2001 From: Ivan Kalvachev <ikalvac...@gmail.com> Date: Wed, 28 Feb 2018 17:48:40 +0200 Subject: [PATCH] Improve and fix put_vc2_ue_uint() function. Replace two bit handling loops and internal conditional branch with simple formula using few logical operations. The old function would generate wrong output if the input does not fit into 15 bits. Fix this by using 64 bit math and put_bits64(). This case should be quite rare, since the bug has not asserted itself. Signed-off-by: Ivan Kalvachev <ikalvac...@gmail.com> --- libavcodec/vc2enc.c | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c index b7adcd3d36..b2f1611ea3 100644 --- a/libavcodec/vc2enc.c +++ b/libavcodec/vc2enc.c @@ -187,28 +187,33 @@ typedef struct VC2EncContext { static av_always_inline void put_vc2_ue_uint(PutBitContext *pb, uint32_t val) { - int i; - int pbits = 0, bits = 0, topbit = 1, maxval = 1; + int bits = 0; + uint64_t pbits = 0; if (!val++) { put_bits(pb, 1, 1); return; } - while (val > maxval) { - topbit <<= 1; - maxval <<= 1; - maxval |= 1; - } + bits = ff_log2(val); - bits = ff_log2(topbit); + if (bits > 15) { + pbits = val; - for (i = 0; i < bits; i++) { - topbit >>= 1; - pbits <<= 2; - if (val & topbit) - pbits |= 0x1; + pbits = ((pbits<<16)|pbits)&0x0000FFFF0000FFFFULL; + pbits = ((pbits<< 8)|pbits)&0x00FF00FF00FF00FFULL; + pbits = ((pbits<< 4)|pbits)&0x0F0F0F0F0F0F0F0FULL; + pbits = ((pbits<< 2)|pbits)&0x3333333333333333ULL; + pbits = ((pbits<< 1)|pbits)&0x5555555555555555ULL; + + put_bits64(pb, bits*2 + 1, (pbits << 1) | 1); + return; } + // ____'____ ____'____ ponm'lkji hgfe'dcba + val = ( (val << 8) | val ) & 0x00FF00FF; // ____'____ ponm'lkji ____'____ hgfe'dcba + val = ( (val << 4) | val ) & 0x0F0F0F0F; // ____'ponm ____'lkji ____'hgfe ____'dcba + val = ( (val << 2) | val ) & 0x33333333; // __po'__nm __lk'__ji __hg'__fe __dc'__ba + val = ( (val << 1) | val ) & 0x55555555; // _p_o'_n_m _l_k'_j_i _h_g'_f_e _d_c'_b_a put_bits(pb, bits*2 + 1, (pbits << 1) | 1); } -- 2.16.2
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel