Re: [FFmpeg-devel] [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format

Lynne Tue, 12 Jan 2021 15:55:50 -0800

Jan 12, 2021, 22:24 by andreas.rheinha...@gmail.com:

> Lynne:
>
>> The AC3 encoder used to be a separate library called "Aften", which
>> got merged into libavcodec (literally, SVN commits and all).
>> The merge preserved as much features from the library as possible.
>>
>> The code had two versions - a fixed point version and a floating
>> point version. FFmpeg had floating point DSP code used by other
>> codecs, the AC3 decoder including, so the floating-point DSP was
>> simply replaced with FFmpeg's own functions.
>> However, FFmpeg had no fixed-point audio code at that point. So
>> the encoder brought along its own fixed-point DSP functions,
>> including a fixed-point MDCT.
>>
>> The fixed-point MDCT itself is trivially just a float MDCT with a
>> different type and each multiply being a fixed-point multiply.
>> So over time, it got refactored, and the FFT used for all other codecs
>> was templated.
>>
>> Due to design decisions at the time, the fixed-point version of the
>> encoder operates at 16-bits of precision. Although convenient, this,
>> even at the time, was inadequate and inefficient. The encoder is noisy,
>> does not produce output comparable to the float encoder, and even
>> rings at higher frequencies due to the badly approximated winow function.
>>
>> Enter MIPS (owned by Imagination Technologies at the time). They wanted
>> quick fixed-point decoding on their FPUless cores. So they contributed
>> patches to template the AC3 decoder so it had both a fixed-point
>> and a floating-point version. They also did the same for the AAC decoder.
>> They however, used 32-bit samples. Not 16-bits. And we did not have
>> 32-bit fixed-point DSP functions, including an MDCT. But instead of
>> templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit 
>> fixed),
>> they simply copy-pasted their own MDCT into ours, and completely
>> ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.
>>
>> This is also the status quo nowadays - 2 separate MDCTs, one which
>> produces floating point and 16-bit fixed point versions, and one
>> sort-of integrated which produces 32-bit MDCT.
>>
>> MIPS weren't all that interested in encoding, so they left the encoder
>> as-is, and they didn't care much about the ifdeffery, mess or quality - it's
>> not their problem.
>>
>> So the MDCT/FFT code has always been a thorn in anyone looking to clean up
>> code's eye.
>>
>> Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
>> So for the floating point version, the encoder simply runs the float MDCT,
>> and converts the resulting coefficients to 25-bit fixed-point, as AC3 is 
>> inherently
>> a fixed-point codec. For the fixed-point version, the input is 16-bit 
>> samples,
>> so to maximize precision the frame samples are analyzed and the highest set
>> bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
>> scaled up via ac3_lshift_int16(), so the input for the FFT is always at 
>> least 14 bits,
>> computed in normalize_samples(). After FFT, the coefficients are scaled up 
>> to 25 bits.
>>
>> This patch simply changes the encoder to accept 32-bit samples, reusing
>> the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
>> a large part of a very messy code of ours, as well as prepare for the future 
>> lavu/tx
>> conversion. The coefficients are simply scaled down to 25 bits during 
>> windowing,
>> skipping 2 separate scalings, as the hacks to extend precision are simply no 
>> longer
>> necessary. There's no point in running the MDCT always at 32 bits when you're
>> going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
>> properly.
>>
>> This also makes the encoder even slightly more accurate over the float 
>> version,
>> as there's no coefficient conversion step necessary.
>>
>> SIZE SAVINGS:
>> ARM32:
>> HARDCODED TABLES:
>> BASE           - 10709590
>> DROP  DSP      - 10702872 - diff:   -6.56KiB
>> DROP  MDCT     - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
>> DROP  FFT      - 10336652 - diff: -323.52KiB - all:   -364.20KiB
>> SOFTCODED TABLES:
>> BASE           -  9685096
>> DROP  DSP      -  9678378 - diff:   -6.56KiB
>> DROP  MDCT     -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
>> DROP  FFT      -  9573918 - diff:  -67.92KiB - all:   -108.57KiB
>>
>> ARM64:
>> HARDCODED TABLES:
>> BASE           - 14641112
>> DROP  DSP      - 14633806 - diff:   -7.13KiB
>> DROP  MDCT     - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
>> DROP  FFT      - 14286826 - diff: -310.53KiB - all:   -345.98KiB
>> SOFTCODED TABLES:
>> BASE           - 13636238
>> DROP  DSP      - 13628932 - diff:   -7.13KiB
>> DROP  MDCT     - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
>> DROP  FFT      - 13542080 - diff:  -56.43KiB - all:    -91.95KiB
>>
>> x86:
>> HARDCODED TABLES:
>> BASE           - 12367336
>> DROP  DSP      - 12354698 - diff:  -12.34KiB
>> DROP  MDCT     - 12331024 - diff:  -23.12KiB - both:   -35.46KiB
>> DROP  FFT      - 12029788 - diff: -294.18KiB - all:   -329.64KiB
>> SOFTCODED TABLES:
>> BASE           - 11358094
>> DROP  DSP      - 11345456 - diff:  -12.34KiB
>> DROP  MDCT     - 11321742 - diff:  -23.16KiB - both:   -35.50KiB
>> DROP  FFT      - 11276946 - diff:  -43.75KiB - all:    -79.25KiB
>>
>> PERFORMANCE (10min random s32le):
>> ARM32 - before -  39.9x - 0m15.046s
>> ARM32 - after  -  28.2x - 0m21.525s
>>                        Speed:  -30%
>>
>> ARM64 - before -  36.1x - 0m16.637s
>> ARM64 - after  -  36.0x - 0m16.727s
>>                        Speed: -0.5%
>>
>> x86   - before - 184x -    0m3.277s
>> x86   - after  - 190x -    0m3.187s
>>                        Speed:   +3%
>>
>> New patch attached.
>>
> FATE still fails with patches 1 and 2 on patchwork.
>


Fixed. Forgot a line. New patches attached.

>From b3473a393757bf3696efa11a8a131454f51722f9 Mon Sep 17 00:00:00 2001
From: Lynne <d...@lynne.ee>
Date: Sat, 9 Jan 2021 01:51:52 +0100
Subject: [PATCH v3 1/5] ac3enc_fixed: convert to 32-bit sample format

The AC3 encoder used to be a separate library called "Aften", which
got merged into libavcodec (literally, SVN commits and all).
The merge preserved as much features from the library as possible.

The code had two versions - a fixed point version and a floating
point version. FFmpeg had floating point DSP code used by other
codecs, the AC3 decoder including, so the floating-point DSP was
simply replaced with FFmpeg's own functions.
However, FFmpeg had no fixed-point audio code at that point. So
the encoder brought along its own fixed-point DSP functions,
including a fixed-point MDCT.

The fixed-point MDCT itself is trivially just a float MDCT with a
different type and each multiply being a fixed-point multiply.
So over time, it got refactored, and the FFT used for all other codecs
was templated.

Due to design decisions at the time, the fixed-point version of the
encoder operates at 16-bits of precision. Although convenient, this,
even at the time, was inadequate and inefficient. The encoder is noisy,
does not produce output comparable to the float encoder, and even
rings at higher frequencies due to the badly approximated winow function.

Enter MIPS (owned by Imagination Technologies at the time). They wanted
quick fixed-point decoding on their FPUless cores. So they contributed
patches to template the AC3 decoder so it had both a fixed-point
and a floating-point version. They also did the same for the AAC decoder.
They however, used 32-bit samples. Not 16-bits. And we did not have
32-bit fixed-point DSP functions, including an MDCT. But instead of
templating our MDCT to output 3 versions (float, 32-bit fixed and 16-bit fixed),
they simply copy-pasted their own MDCT into ours, and completely
ifdeffed our own MDCT code out if a 32-bit fixed point MDCT was selected.

This is also the status quo nowadays - 2 separate MDCTs, one which
produces floating point and 16-bit fixed point versions, and one
sort-of integrated which produces 32-bit MDCT.

MIPS weren't all that interested in encoding, so they left the encoder
as-is, and they didn't care much about the ifdeffery, mess or quality - it's
not their problem.

So the MDCT/FFT code has always been a thorn in anyone looking to clean up
code's eye.

Backstory over. Internally AC3 operates on 25-bit fixed-point coefficients.
So for the floating point version, the encoder simply runs the float MDCT,
and converts the resulting coefficients to 25-bit fixed-point, as AC3 is inherently
a fixed-point codec. For the fixed-point version, the input is 16-bit samples,
so to maximize precision the frame samples are analyzed and the highest set
bit is detected via ac3_max_msb_abs_int16(), and the coefficients are then
scaled up via ac3_lshift_int16(), so the input for the FFT is always at least 14 bits,
computed in normalize_samples(). After FFT, the coefficients are scaled up to 25 bits.

This patch simply changes the encoder to accept 32-bit samples, reusing
the already well-optimized 32-bit MDCT code, allowing us to clean up and drop
a large part of a very messy code of ours, as well as prepare for the future lavu/tx
conversion. The coefficients are simply scaled down to 25 bits during windowing,
skipping 2 separate scalings, as the hacks to extend precision are simply no longer
necessary. There's no point in running the MDCT always at 32 bits when you're
going to drop 6 bits off anyway, the headroom is plenty, and the MDCT rounds
properly.

This also makes the encoder even slightly more accurate over the float version,
as there's no coefficient conversion step necessary.

SIZE SAVINGS:
ARM32:
HARDCODED TABLES:
BASE           - 10709590
DROP  DSP      - 10702872 - diff:   -6.56KiB
DROP  MDCT     - 10667932 - diff:  -34.12KiB - both:   -40.68KiB
DROP  FFT      - 10336652 - diff: -323.52KiB - all:   -364.20KiB
SOFTCODED TABLES:
BASE           -  9685096
DROP  DSP      -  9678378 - diff:   -6.56KiB
DROP  MDCT     -  9643466 - diff:  -34.09KiB - both:   -40.65KiB
DROP  FFT      -  9573918 - diff:  -67.92KiB - all:   -108.57KiB

ARM64:
HARDCODED TABLES:
BASE           - 14641112
DROP  DSP      - 14633806 - diff:   -7.13KiB
DROP  MDCT     - 14604812 - diff:  -28.31KiB - both:   -35.45KiB
DROP  FFT      - 14286826 - diff: -310.53KiB - all:   -345.98KiB
SOFTCODED TABLES:
BASE           - 13636238
DROP  DSP      - 13628932 - diff:   -7.13KiB
DROP  MDCT     - 13599866 - diff:  -28.38KiB - both:   -35.52KiB
DROP  FFT      - 13542080 - diff:  -56.43KiB - all:    -91.95KiB

x86:
HARDCODED TABLES:
BASE           - 12367336
DROP  DSP      - 12354698 - diff:  -12.34KiB
DROP  MDCT     - 12331024 - diff:  -23.12KiB - both:   -35.46KiB
DROP  FFT      - 12029788 - diff: -294.18KiB - all:   -329.64KiB
SOFTCODED TABLES:
BASE           - 11358094
DROP  DSP      - 11345456 - diff:  -12.34KiB
DROP  MDCT     - 11321742 - diff:  -23.16KiB - both:   -35.50KiB
DROP  FFT      - 11276946 - diff:  -43.75KiB - all:    -79.25KiB

PERFORMANCE (10min random s32le):
ARM32 - before -  39.9x - 0m15.046s
ARM32 - after  -  28.2x - 0m21.525s
                       Speed:  -30%

ARM64 - before -  36.1x - 0m16.637s
ARM64 - after  -  36.0x - 0m16.727s
                       Speed: -0.5%

x86   - before - 184x -    0m3.277s
x86   - after  - 190x -    0m3.187s
                       Speed:   +3%
---
 doc/encoders.texi                 |  7 ++--
 libavcodec/Makefile               |  2 +-
 libavcodec/ac3enc.c               |  1 +
 libavcodec/ac3enc.h               | 11 +++---
 libavcodec/ac3enc_fixed.c         | 60 ++++++++++++-------------------
 libavcodec/ac3enc_float.c         |  1 -
 libavcodec/ac3enc_template.c      | 21 ++++-------
 libavcodec/version.h              |  2 +-
 tests/fate/ac3.mak                |  2 +-
 tests/fate/ffmpeg.mak             |  2 +-
 tests/ref/fate/unknown_layout-ac3 |  2 +-
 tests/ref/lavf/rm                 |  2 +-
 12 files changed, 46 insertions(+), 67 deletions(-)

diff --git a/doc/encoders.texi b/doc/encoders.texi
index 0b1c69e982..60e763a704 100644
--- a/doc/encoders.texi
+++ b/doc/encoders.texi
@@ -151,10 +151,9 @@ the undocumented RealAudio 3 (a.k.a. dnet).
 The @var{ac3} encoder uses floating-point math, while the @var{ac3_fixed}
 encoder only uses fixed-point integer math. This does not mean that one is
 always faster, just that one or the other may be better suited to a
-particular system. The floating-point encoder will generally produce better
-quality audio for a given bitrate. The @var{ac3_fixed} encoder is not the
-default codec for any of the output formats, so it must be specified explicitly
-using the option @code{-acodec ac3_fixed} in order to use it.
+particular system. The @var{ac3_fixed} encoder is not the default codec for
+any of the output formats, so it must be specified explicitly using the option
+@code{-acodec ac3_fixed} in order to use it.
 
 @subsection AC-3 Metadata
 
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 35318f4f4d..0546e6f6c5 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -181,7 +181,7 @@ OBJS-$(CONFIG_AC3_DECODER)             += ac3dec_float.o ac3dec_data.o ac3.o kbd
 OBJS-$(CONFIG_AC3_FIXED_DECODER)       += ac3dec_fixed.o ac3dec_data.o ac3.o kbdwin.o ac3tab.o
 OBJS-$(CONFIG_AC3_ENCODER)             += ac3enc_float.o ac3enc.o ac3tab.o \
                                           ac3.o kbdwin.o
-OBJS-$(CONFIG_AC3_FIXED_ENCODER)       += ac3enc_fixed.o ac3enc.o ac3tab.o ac3.o
+OBJS-$(CONFIG_AC3_FIXED_ENCODER)       += ac3enc_fixed.o ac3enc.o ac3tab.o ac3.o kbdwin.o
 OBJS-$(CONFIG_AC3_MF_ENCODER)          += mfenc.o mf_utils.o
 OBJS-$(CONFIG_ACELP_KELVIN_DECODER)    += g729dec.o lsp.o celp_math.o celp_filters.o acelp_filters.o acelp_pitch_delay.o acelp_vectors.o g729postfilter.o
 OBJS-$(CONFIG_AGM_DECODER)             += agm.o
diff --git a/libavcodec/ac3enc.c b/libavcodec/ac3enc.c
index b2e3b2bb4b..9dafe0ef55 100644
--- a/libavcodec/ac3enc.c
+++ b/libavcodec/ac3enc.c
@@ -2047,6 +2047,7 @@ av_cold int ff_ac3_encode_close(AVCodecContext *avctx)
     int blk, ch;
     AC3EncodeContext *s = avctx->priv_data;
 
+    av_freep(&s->mdct_window);
     av_freep(&s->windowed_samples);
     if (s->planar_samples)
     for (ch = 0; ch < s->channels; ch++)
diff --git a/libavcodec/ac3enc.h b/libavcodec/ac3enc.h
index 044564ecb4..ba62891371 100644
--- a/libavcodec/ac3enc.h
+++ b/libavcodec/ac3enc.h
@@ -30,8 +30,6 @@
 
 #include <stdint.h>
 
-#include "libavutil/float_dsp.h"
-
 #include "ac3.h"
 #include "ac3dsp.h"
 #include "avcodec.h"
@@ -53,6 +51,7 @@
 #define AC3ENC_TYPE_EAC3        2
 
 #if AC3ENC_FLOAT
+#include "libavutil/float_dsp.h"
 #define AC3_NAME(x) ff_ac3_float_ ## x
 #define MAC_COEF(d,a,b) ((d)+=(a)*(b))
 #define COEF_MIN (-16777215.0/16777216.0)
@@ -62,12 +61,13 @@ typedef float SampleType;
 typedef float CoefType;
 typedef float CoefSumType;
 #else
+#include "libavutil/fixed_dsp.h"
 #define AC3_NAME(x) ff_ac3_fixed_ ## x
 #define MAC_COEF(d,a,b) MAC64(d,a,b)
 #define COEF_MIN -16777215
 #define COEF_MAX  16777215
 #define NEW_CPL_COORD_THRESHOLD 503317
-typedef int16_t SampleType;
+typedef int32_t SampleType;
 typedef int32_t CoefType;
 typedef int64_t CoefSumType;
 #endif
@@ -141,7 +141,6 @@ typedef struct AC3Block {
     uint16_t **qmant;                           ///< quantized mantissas
     uint8_t  **cpl_coord_exp;                   ///< coupling coord exponents           (cplcoexp)
     uint8_t  **cpl_coord_mant;                  ///< coupling coord mantissas           (cplcomant)
-    uint8_t  coeff_shift[AC3_MAX_CHANNELS];     ///< fixed-point coefficient shift values
     uint8_t  new_rematrixing_strategy;          ///< send new rematrixing flags in this block
     int      num_rematrixing_bands;             ///< number of rematrixing bands
     uint8_t  rematrixing_flags[4];              ///< rematrixing flags
@@ -165,7 +164,11 @@ typedef struct AC3EncodeContext {
     AVCodecContext *avctx;                  ///< parent AVCodecContext
     PutBitContext pb;                       ///< bitstream writer context
     AudioDSPContext adsp;
+#if AC3ENC_FLOAT
     AVFloatDSPContext *fdsp;
+#else
+    AVFixedDSPContext *fdsp;
+#endif
     MECmpContext mecc;
     AC3DSPContext ac3dsp;                   ///< AC-3 optimized functions
     FFTContext mdct;                        ///< FFT context for MDCT calculation
diff --git a/libavcodec/ac3enc_fixed.c b/libavcodec/ac3enc_fixed.c
index 7818dd8c35..eab086cdab 100644
--- a/libavcodec/ac3enc_fixed.c
+++ b/libavcodec/ac3enc_fixed.c
@@ -26,12 +26,14 @@
  * fixed-point AC-3 encoder.
  */
 
-#define FFT_FLOAT 0
 #define AC3ENC_FLOAT 0
+#define FFT_FLOAT 0
+#define FFT_FIXED_32 1
 #include "internal.h"
 #include "audiodsp.h"
 #include "ac3enc.h"
 #include "eac3enc.h"
+#include "kbdwin.h"
 
 #define AC3ENC_TYPE AC3ENC_TYPE_AC3_FIXED
 #include "ac3enc_opts_template.c"
@@ -43,37 +45,6 @@ static const AVClass ac3enc_class = {
     .version    = LIBAVUTIL_VERSION_INT,
 };
 
-/*
- * Normalize the input samples to use the maximum available precision.
- * This assumes signed 16-bit input samples.
- */
-static int normalize_samples(AC3EncodeContext *s)
-{
-    int v = s->ac3dsp.ac3_max_msb_abs_int16(s->windowed_samples, AC3_WINDOW_SIZE);
-    v = 14 - av_log2(v);
-    if (v > 0)
-        s->ac3dsp.ac3_lshift_int16(s->windowed_samples, AC3_WINDOW_SIZE, v);
-    /* +6 to right-shift from 31-bit to 25-bit */
-    return v + 6;
-}
-
-
-/*
- * Scale MDCT coefficients to 25-bit signed fixed-point.
- */
-static void scale_coefficients(AC3EncodeContext *s)
-{
-    int blk, ch;
-
-    for (blk = 0; blk < s->num_blocks; blk++) {
-        AC3Block *block = &s->blocks[blk];
-        for (ch = 1; ch <= s->channels; ch++) {
-            s->ac3dsp.ac3_rshift_int32(block->mdct_coef[ch], AC3_MAX_COEFS,
-                                       block->coeff_shift[ch]);
-        }
-    }
-}
-
 static void sum_square_butterfly(AC3EncodeContext *s, int64_t sum[4],
                                  const int32_t *coef0, const int32_t *coef1,
                                  int len)
@@ -120,7 +91,6 @@ static av_cold void ac3_fixed_mdct_end(AC3EncodeContext *s)
     ff_mdct_end(&s->mdct);
 }
 
-
 /**
  * Initialize MDCT tables.
  *
@@ -129,9 +99,25 @@ static av_cold void ac3_fixed_mdct_end(AC3EncodeContext *s)
  */
 static av_cold int ac3_fixed_mdct_init(AC3EncodeContext *s)
 {
-    int ret = ff_mdct_init(&s->mdct, 9, 0, -1.0);
-    s->mdct_window = ff_ac3_window;
-    return ret;
+    float fwin[AC3_BLOCK_SIZE];
+
+    int32_t *iwin = av_malloc_array(AC3_WINDOW_SIZE, sizeof(*iwin));
+    if (!iwin)
+        return AVERROR(ENOMEM);
+
+    ff_kbd_window_init(fwin, 5.0, AC3_WINDOW_SIZE/2);
+    for (int i = 0; i < AC3_WINDOW_SIZE/2; i++) {
+        iwin[i] = lrintf(fwin[i] * (1 << 22));
+        iwin[AC3_WINDOW_SIZE-1-i] = lrintf(fwin[i] * (1 << 22));
+    }
+
+    s->mdct_window = iwin;
+
+    s->fdsp = avpriv_alloc_fixed_dsp(s->avctx->flags & AV_CODEC_FLAG_BITEXACT);
+    if (!s->fdsp)
+        return AVERROR(ENOMEM);
+
+    return ff_mdct_init(&s->mdct, 9, 0, -1.0);
 }
 
 
@@ -155,7 +141,7 @@ AVCodec ff_ac3_fixed_encoder = {
     .init            = ac3_fixed_encode_init,
     .encode2         = ff_ac3_fixed_encode_frame,
     .close           = ff_ac3_encode_close,
-    .sample_fmts     = (const enum AVSampleFormat[]){ AV_SAMPLE_FMT_S16P,
+    .sample_fmts     = (const enum AVSampleFormat[]){ AV_SAMPLE_FMT_S32P,
                                                       AV_SAMPLE_FMT_NONE },
     .priv_class      = &ac3enc_class,
     .caps_internal   = FF_CODEC_CAP_INIT_THREADSAFE | FF_CODEC_CAP_INIT_CLEANUP,
diff --git a/libavcodec/ac3enc_float.c b/libavcodec/ac3enc_float.c
index 45bfed34f9..b17b3a2365 100644
--- a/libavcodec/ac3enc_float.c
+++ b/libavcodec/ac3enc_float.c
@@ -97,7 +97,6 @@ static void sum_square_butterfly(AC3EncodeContext *s, float sum[4],
 static av_cold void ac3_float_mdct_end(AC3EncodeContext *s)
 {
     ff_mdct_end(&s->mdct);
-    av_freep(&s->mdct_window);
 }
 
 
diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c
index 0fdc95b968..de6eba71d8 100644
--- a/libavcodec/ac3enc_template.c
+++ b/libavcodec/ac3enc_template.c
@@ -91,18 +91,11 @@ static void apply_mdct(AC3EncodeContext *s)
             AC3Block *block = &s->blocks[blk];
             const SampleType *input_samples = &s->planar_samples[ch][blk * AC3_BLOCK_SIZE];
 
-#if AC3ENC_FLOAT
             s->fdsp->vector_fmul(s->windowed_samples, input_samples,
-                                s->mdct_window, AC3_WINDOW_SIZE);
-#else
-            s->ac3dsp.apply_window_int16(s->windowed_samples, input_samples,
-                                         s->mdct_window, AC3_WINDOW_SIZE);
-
-            block->coeff_shift[ch + 1] = normalize_samples(s);
-#endif
+                                 s->mdct_window, AC3_WINDOW_SIZE);
 
-            s->mdct.mdct_calcw(&s->mdct, block->mdct_coef[ch+1],
-                               s->windowed_samples);
+            s->mdct.mdct_calc(&s->mdct, block->mdct_coef[ch+1],
+                              s->windowed_samples);
         }
     }
 }
@@ -390,9 +383,6 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt,
 
     apply_mdct(s);
 
-    if (!AC3ENC_FLOAT)
-        scale_coefficients(s);
-
     clip_coefficients(&s->adsp, s->blocks[0].mdct_coef[1],
                       AC3_MAX_COEFS * s->num_blocks * s->channels);
 
@@ -404,8 +394,9 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt,
 
     compute_rematrixing_strategy(s);
 
-    if (AC3ENC_FLOAT)
-        scale_coefficients(s);
+#if AC3ENC_FLOAT
+    scale_coefficients(s);
+#endif
 
     return ff_ac3_encode_frame_common_end(avctx, avpkt, frame, got_packet_ptr);
 }
diff --git a/libavcodec/version.h b/libavcodec/version.h
index 1420439044..cd871f0fa0 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -28,7 +28,7 @@
 #include "libavutil/version.h"
 
 #define LIBAVCODEC_VERSION_MAJOR  58
-#define LIBAVCODEC_VERSION_MINOR 116
+#define LIBAVCODEC_VERSION_MINOR 117
 #define LIBAVCODEC_VERSION_MICRO 100
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
diff --git a/tests/fate/ac3.mak b/tests/fate/ac3.mak
index 757cd51cf2..d76e22bade 100644
--- a/tests/fate/ac3.mak
+++ b/tests/fate/ac3.mak
@@ -90,7 +90,7 @@ fate-ac3-fixed-encode: tests/data/asynth-44100-2.wav
 fate-ac3-fixed-encode: SRC = $(TARGET_PATH)/tests/data/asynth-44100-2.wav
 fate-ac3-fixed-encode: CMD = md5 -i $(SRC) -c ac3_fixed -ab 128k -f ac3 -flags +bitexact -af aresample
 fate-ac3-fixed-encode: CMP = oneline
-fate-ac3-fixed-encode: REF = a1d1fc116463b771abf5aef7ed37d7b1
+fate-ac3-fixed-encode: REF = 1f548175e11a95e62ce20e442fcc8d08
 
 FATE_EAC3-$(call ALLYES, EAC3_DEMUXER EAC3_MUXER EAC3_CORE_BSF) += fate-eac3-core-bsf
 fate-eac3-core-bsf: CMD = md5pipe -i $(TARGET_SAMPLES)/eac3/the_great_wall_7.1.eac3 -c:a copy -bsf:a eac3_core -fflags +bitexact -f eac3
diff --git a/tests/fate/ffmpeg.mak b/tests/fate/ffmpeg.mak
index c6d8dc2e5c..4dfb77d250 100644
--- a/tests/fate/ffmpeg.mak
+++ b/tests/fate/ffmpeg.mak
@@ -83,7 +83,7 @@ fate-unknown_layout-pcm: CMD = md5 \
 FATE_FFMPEG-$(call ALLYES, PCM_S16LE_DEMUXER AC3_MUXER PCM_S16LE_DECODER AC3_FIXED_ENCODER) += fate-unknown_layout-ac3
 fate-unknown_layout-ac3: $(AREF)
 fate-unknown_layout-ac3: CMD = md5 -auto_conversion_filters \
-  -guess_layout_max 0 -f s16le -ac 1 -ar 44100 -i $(TARGET_PATH)/$(AREF) \
+  -guess_layout_max 0 -f s32le -ac 1 -ar 44100 -i $(TARGET_PATH)/$(AREF) \
   -f ac3 -flags +bitexact -c ac3_fixed
 
 
diff --git a/tests/ref/fate/unknown_layout-ac3 b/tests/ref/fate/unknown_layout-ac3
index d332efcec4..719a44aacf 100644
--- a/tests/ref/fate/unknown_layout-ac3
+++ b/tests/ref/fate/unknown_layout-ac3
@@ -1 +1 @@
-bbb7550d6d93973c10f4ee13c87cf799
+febdb165cfd6cba375aa086195e61213
diff --git a/tests/ref/lavf/rm b/tests/ref/lavf/rm
index 43ea4c7897..fc2a6564a2 100644
--- a/tests/ref/lavf/rm
+++ b/tests/ref/lavf/rm
@@ -1,2 +1,2 @@
-e30681d05d6f3d24108d3614600bf116 *tests/data/lavf/lavf.rm
+8dfb8d4556d61d3615e0d0012ffe540c *tests/data/lavf/lavf.rm
 346424 tests/data/lavf/lavf.rm
-- 
2.30.0.rc2

>From 6334fd0d8b2a8887fd87b03704ef83084e8365c9 Mon Sep 17 00:00:00 2001
From: Lynne <d...@lynne.ee>
Date: Sat, 9 Jan 2021 09:05:18 +0100
Subject: [PATCH v3 2/5] ac3enc: do not clip coefficients after transforms

In either encoder, its impossible for the coefficients to go past 25 bits
right after the MDCT. Our MDCT is numerically stable.
For the floating point encoder, in case a NaN is contained, lrintf() will
raise a floating point exception during the conversion.
---
 libavcodec/ac3enc_template.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c
index de6eba71d8..4f1e181e0b 100644
--- a/libavcodec/ac3enc_template.c
+++ b/libavcodec/ac3enc_template.c
@@ -383,9 +383,6 @@ int AC3_NAME(encode_frame)(AVCodecContext *avctx, AVPacket *avpkt,
 
     apply_mdct(s);
 
-    clip_coefficients(&s->adsp, s->blocks[0].mdct_coef[1],
-                      AC3_MAX_COEFS * s->num_blocks * s->channels);
-
     s->cpl_on = s->cpl_enabled;
     ff_ac3_compute_coupling_strategy(s);
 
-- 
2.30.0.rc2

>From 06a6244825452cd0408eed86b040e4032630683d Mon Sep 17 00:00:00 2001
From: Lynne <d...@lynne.ee>
Date: Sat, 9 Jan 2021 17:27:16 +0100
Subject: [PATCH v3 3/5] ac3enc: halve the MDCT window size by using
 vector_fmul_reverse

This brings the encoder in-line with the rest of ours and saves
on a bit of memory.
---
 libavcodec/ac3enc_fixed.c    |  8 +++-----
 libavcodec/ac3enc_float.c    | 15 ++++-----------
 libavcodec/ac3enc_template.c |  5 ++++-
 3 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/libavcodec/ac3enc_fixed.c b/libavcodec/ac3enc_fixed.c
index eab086cdab..7a8a77fb93 100644
--- a/libavcodec/ac3enc_fixed.c
+++ b/libavcodec/ac3enc_fixed.c
@@ -101,15 +101,13 @@ static av_cold int ac3_fixed_mdct_init(AC3EncodeContext *s)
 {
     float fwin[AC3_BLOCK_SIZE];
 
-    int32_t *iwin = av_malloc_array(AC3_WINDOW_SIZE, sizeof(*iwin));
+    int32_t *iwin = av_malloc_array(AC3_BLOCK_SIZE, sizeof(*iwin));
     if (!iwin)
         return AVERROR(ENOMEM);
 
-    ff_kbd_window_init(fwin, 5.0, AC3_WINDOW_SIZE/2);
-    for (int i = 0; i < AC3_WINDOW_SIZE/2; i++) {
+    ff_kbd_window_init(fwin, 5.0, AC3_BLOCK_SIZE);
+    for (int i = 0; i < AC3_BLOCK_SIZE; i++)
         iwin[i] = lrintf(fwin[i] * (1 << 22));
-        iwin[AC3_WINDOW_SIZE-1-i] = lrintf(fwin[i] * (1 << 22));
-    }
 
     s->mdct_window = iwin;
 
diff --git a/libavcodec/ac3enc_float.c b/libavcodec/ac3enc_float.c
index b17b3a2365..74f3ab8d86 100644
--- a/libavcodec/ac3enc_float.c
+++ b/libavcodec/ac3enc_float.c
@@ -108,23 +108,16 @@ static av_cold void ac3_float_mdct_end(AC3EncodeContext *s)
  */
 static av_cold int ac3_float_mdct_init(AC3EncodeContext *s)
 {
-    float *window;
-    int i, n, n2;
-
-    n  = 1 << 9;
-    n2 = n >> 1;
-
-    window = av_malloc_array(n, sizeof(*window));
+    float *window = av_malloc_array(AC3_BLOCK_SIZE, sizeof(*window));
     if (!window) {
         av_log(s->avctx, AV_LOG_ERROR, "Cannot allocate memory.\n");
         return AVERROR(ENOMEM);
     }
-    ff_kbd_window_init(window, 5.0, n2);
-    for (i = 0; i < n2; i++)
-        window[n-1-i] = window[i];
+
+    ff_kbd_window_init(window, 5.0, AC3_BLOCK_SIZE);
     s->mdct_window = window;
 
-    return ff_mdct_init(&s->mdct, 9, 0, -2.0 / n);
+    return ff_mdct_init(&s->mdct, 9, 0, -2.0 / AC3_WINDOW_SIZE);
 }
 
 
diff --git a/libavcodec/ac3enc_template.c b/libavcodec/ac3enc_template.c
index 4f1e181e0b..5ecef3b178 100644
--- a/libavcodec/ac3enc_template.c
+++ b/libavcodec/ac3enc_template.c
@@ -92,7 +92,10 @@ static void apply_mdct(AC3EncodeContext *s)
             const SampleType *input_samples = &s->planar_samples[ch][blk * AC3_BLOCK_SIZE];
 
             s->fdsp->vector_fmul(s->windowed_samples, input_samples,
-                                 s->mdct_window, AC3_WINDOW_SIZE);
+                                 s->mdct_window, AC3_BLOCK_SIZE);
+            s->fdsp->vector_fmul_reverse(s->windowed_samples + AC3_BLOCK_SIZE,
+                                         &input_samples[AC3_BLOCK_SIZE],
+                                         s->mdct_window, AC3_BLOCK_SIZE);
 
             s->mdct.mdct_calc(&s->mdct, block->mdct_coef[ch+1],
                               s->windowed_samples);
-- 
2.30.0.rc2

>From 2847980c73b01fc7bf6f95cf15749bb4eeb7f878 Mon Sep 17 00:00:00 2001
From: Lynne <d...@lynne.ee>
Date: Sat, 9 Jan 2021 03:19:18 +0100
Subject: [PATCH v3 4/5] ac3enc_fixed: drop unnecessary fixed-point DSP code

---
 libavcodec/ac3dsp.c              |  60 -------
 libavcodec/ac3dsp.h              |  47 ------
 libavcodec/ac3tab.c              |  38 -----
 libavcodec/ac3tab.h              |   1 -
 libavcodec/arm/ac3dsp_init_arm.c |   9 --
 libavcodec/x86/ac3dsp.asm        | 258 -------------------------------
 libavcodec/x86/ac3dsp_init.c     |  52 +------
 7 files changed, 1 insertion(+), 464 deletions(-)

diff --git a/libavcodec/ac3dsp.c b/libavcodec/ac3dsp.c
index 382f87c05f..85c721dd3b 100644
--- a/libavcodec/ac3dsp.c
+++ b/libavcodec/ac3dsp.c
@@ -46,49 +46,6 @@ static void ac3_exponent_min_c(uint8_t *exp, int num_reuse_blocks, int nb_coefs)
     }
 }
 
-static int ac3_max_msb_abs_int16_c(const int16_t *src, int len)
-{
-    int i, v = 0;
-    for (i = 0; i < len; i++)
-        v |= abs(src[i]);
-    return v;
-}
-
-static void ac3_lshift_int16_c(int16_t *src, unsigned int len,
-                               unsigned int shift)
-{
-    uint32_t *src32 = (uint32_t *)src;
-    const uint32_t mask = ~(((1 << shift) - 1) << 16);
-    int i;
-    len >>= 1;
-    for (i = 0; i < len; i += 8) {
-        src32[i  ] = (src32[i  ] << shift) & mask;
-        src32[i+1] = (src32[i+1] << shift) & mask;
-        src32[i+2] = (src32[i+2] << shift) & mask;
-        src32[i+3] = (src32[i+3] << shift) & mask;
-        src32[i+4] = (src32[i+4] << shift) & mask;
-        src32[i+5] = (src32[i+5] << shift) & mask;
-        src32[i+6] = (src32[i+6] << shift) & mask;
-        src32[i+7] = (src32[i+7] << shift) & mask;
-    }
-}
-
-static void ac3_rshift_int32_c(int32_t *src, unsigned int len,
-                               unsigned int shift)
-{
-    do {
-        *src++ >>= shift;
-        *src++ >>= shift;
-        *src++ >>= shift;
-        *src++ >>= shift;
-        *src++ >>= shift;
-        *src++ >>= shift;
-        *src++ >>= shift;
-        *src++ >>= shift;
-        len -= 8;
-    } while (len > 0);
-}
-
 static void float_to_fixed24_c(int32_t *dst, const float *src, unsigned int len)
 {
     const float scale = 1 << 24;
@@ -376,19 +333,6 @@ void ff_ac3dsp_downmix_fixed(AC3DSPContext *c, int32_t **samples, int16_t **matr
         ac3_downmix_c_fixed(samples, matrix, out_ch, in_ch, len);
 }
 
-static void apply_window_int16_c(int16_t *output, const int16_t *input,
-                                 const int16_t *window, unsigned int len)
-{
-    int i;
-    int len2 = len >> 1;
-
-    for (i = 0; i < len2; i++) {
-        int16_t w       = window[i];
-        output[i]       = (MUL16(input[i],       w) + (1 << 14)) >> 15;
-        output[len-i-1] = (MUL16(input[len-i-1], w) + (1 << 14)) >> 15;
-    }
-}
-
 void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
                        int out_ch, int in_ch, int len)
 {
@@ -424,9 +368,6 @@ void ff_ac3dsp_downmix(AC3DSPContext *c, float **samples, float **matrix,
 av_cold void ff_ac3dsp_init(AC3DSPContext *c, int bit_exact)
 {
     c->ac3_exponent_min = ac3_exponent_min_c;
-    c->ac3_max_msb_abs_int16 = ac3_max_msb_abs_int16_c;
-    c->ac3_lshift_int16 = ac3_lshift_int16_c;
-    c->ac3_rshift_int32 = ac3_rshift_int32_c;
     c->float_to_fixed24 = float_to_fixed24_c;
     c->bit_alloc_calc_bap = ac3_bit_alloc_calc_bap_c;
     c->update_bap_counts = ac3_update_bap_counts_c;
@@ -438,7 +379,6 @@ av_cold void ff_ac3dsp_init(AC3DSPContext *c, int bit_exact)
     c->out_channels          = 0;
     c->downmix               = NULL;
     c->downmix_fixed         = NULL;
-    c->apply_window_int16 = apply_window_int16_c;
 
     if (ARCH_ARM)
         ff_ac3dsp_init_arm(c, bit_exact);
diff --git a/libavcodec/ac3dsp.h b/libavcodec/ac3dsp.h
index 161de4cb86..a23b11526e 100644
--- a/libavcodec/ac3dsp.h
+++ b/libavcodec/ac3dsp.h
@@ -42,39 +42,6 @@ typedef struct AC3DSPContext {
      */
     void (*ac3_exponent_min)(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
 
-    /**
-     * Calculate the maximum MSB of the absolute value of each element in an
-     * array of int16_t.
-     * @param src input array
-     *            constraints: align 16. values must be in range [-32767,32767]
-     * @param len number of values in the array
-     *            constraints: multiple of 16 greater than 0
-     * @return    a value with the same MSB as max(abs(src[]))
-     */
-    int (*ac3_max_msb_abs_int16)(const int16_t *src, int len);
-
-    /**
-     * Left-shift each value in an array of int16_t by a specified amount.
-     * @param src    input array
-     *               constraints: align 16
-     * @param len    number of values in the array
-     *               constraints: multiple of 32 greater than 0
-     * @param shift  left shift amount
-     *               constraints: range [0,15]
-     */
-    void (*ac3_lshift_int16)(int16_t *src, unsigned int len, unsigned int shift);
-
-    /**
-     * Right-shift each value in an array of int32_t by a specified amount.
-     * @param src    input array
-     *               constraints: align 16
-     * @param len    number of values in the array
-     *               constraints: multiple of 16 greater than 0
-     * @param shift  right shift amount
-     *               constraints: range [0,31]
-     */
-    void (*ac3_rshift_int32)(int32_t *src, unsigned int len, unsigned int shift);
-
     /**
      * Convert an array of float in range [-1.0,1.0] to int32_t with range
      * [-(1<<24),(1<<24)]
@@ -136,20 +103,6 @@ typedef struct AC3DSPContext {
     int in_channels;
     void (*downmix)(float **samples, float **matrix, int len);
     void (*downmix_fixed)(int32_t **samples, int16_t **matrix, int len);
-
-    /**
-     * Apply symmetric window in 16-bit fixed-point.
-     * @param output destination array
-     *               constraints: 16-byte aligned
-     * @param input  source array
-     *               constraints: 16-byte aligned
-     * @param window window array
-     *               constraints: 16-byte aligned, at least len/2 elements
-     * @param len    full window length
-     *               constraints: multiple of ? greater than zero
-     */
-    void (*apply_window_int16)(int16_t *output, const int16_t *input,
-                               const int16_t *window, unsigned int len);
 } AC3DSPContext;
 
 void ff_ac3dsp_init    (AC3DSPContext *c, int bit_exact);
diff --git a/libavcodec/ac3tab.c b/libavcodec/ac3tab.c
index d018110331..99307218cc 100644
--- a/libavcodec/ac3tab.c
+++ b/libavcodec/ac3tab.c
@@ -147,44 +147,6 @@ const uint8_t ff_eac3_default_cpl_band_struct[18] = {
     0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1
 };
 
-/* AC-3 MDCT window */
-
-/* MDCT window */
-DECLARE_ALIGNED(16, const int16_t, ff_ac3_window)[AC3_WINDOW_SIZE/2] = {
-    4,    7,   12,   16,   21,   28,   34,   42,
-   51,   61,   72,   84,   97,  111,  127,  145,
-  164,  184,  207,  231,  257,  285,  315,  347,
-  382,  419,  458,  500,  544,  591,  641,  694,
-  750,  810,  872,  937, 1007, 1079, 1155, 1235,
- 1318, 1406, 1497, 1593, 1692, 1796, 1903, 2016,
- 2132, 2253, 2379, 2509, 2644, 2783, 2927, 3076,
- 3230, 3389, 3552, 3721, 3894, 4072, 4255, 4444,
- 4637, 4835, 5038, 5246, 5459, 5677, 5899, 6127,
- 6359, 6596, 6837, 7083, 7334, 7589, 7848, 8112,
- 8380, 8652, 8927, 9207, 9491, 9778,10069,10363,
-10660,10960,11264,11570,11879,12190,12504,12820,
-13138,13458,13780,14103,14427,14753,15079,15407,
-15735,16063,16392,16720,17049,17377,17705,18032,
-18358,18683,19007,19330,19651,19970,20287,20602,
-20914,21225,21532,21837,22139,22438,22733,23025,
-23314,23599,23880,24157,24430,24699,24964,25225,
-25481,25732,25979,26221,26459,26691,26919,27142,
-27359,27572,27780,27983,28180,28373,28560,28742,
-28919,29091,29258,29420,29577,29729,29876,30018,
-30155,30288,30415,30538,30657,30771,30880,30985,
-31086,31182,31274,31363,31447,31528,31605,31678,
-31747,31814,31877,31936,31993,32046,32097,32145,
-32190,32232,32272,32310,32345,32378,32409,32438,
-32465,32490,32513,32535,32556,32574,32592,32608,
-32623,32636,32649,32661,32671,32681,32690,32698,
-32705,32712,32718,32724,32729,32733,32737,32741,
-32744,32747,32750,32752,32754,32756,32757,32759,
-32760,32761,32762,32763,32764,32764,32765,32765,
-32766,32766,32766,32766,32767,32767,32767,32767,
-32767,32767,32767,32767,32767,32767,32767,32767,
-32767,32767,32767,32767,32767,32767,32767,32767,
-};
-
 const uint8_t ff_ac3_log_add_tab[260]= {
 0x40,0x3f,0x3e,0x3d,0x3c,0x3b,0x3a,0x39,0x38,0x37,
 0x36,0x35,0x34,0x34,0x33,0x32,0x31,0x30,0x2f,0x2f,
diff --git a/libavcodec/ac3tab.h b/libavcodec/ac3tab.h
index 1d1264e3fc..a0036a301b 100644
--- a/libavcodec/ac3tab.h
+++ b/libavcodec/ac3tab.h
@@ -37,7 +37,6 @@ extern const int      ff_ac3_sample_rate_tab[];
 extern const uint16_t ff_ac3_bitrate_tab[19];
 extern const uint8_t  ff_ac3_rematrix_band_tab[5];
 extern const uint8_t  ff_eac3_default_cpl_band_struct[18];
-extern const int16_t  ff_ac3_window[AC3_WINDOW_SIZE/2];
 extern const uint8_t  ff_ac3_log_add_tab[260];
 extern const uint16_t ff_ac3_hearing_threshold_tab[AC3_CRITICAL_BANDS][3];
 extern const uint8_t  ff_ac3_bap_tab[64];
diff --git a/libavcodec/arm/ac3dsp_init_arm.c b/libavcodec/arm/ac3dsp_init_arm.c
index a3c32ff407..9217a7d0c2 100644
--- a/libavcodec/arm/ac3dsp_init_arm.c
+++ b/libavcodec/arm/ac3dsp_init_arm.c
@@ -26,13 +26,8 @@
 #include "config.h"
 
 void ff_ac3_exponent_min_neon(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
-int ff_ac3_max_msb_abs_int16_neon(const int16_t *src, int len);
-void ff_ac3_lshift_int16_neon(int16_t *src, unsigned len, unsigned shift);
-void ff_ac3_rshift_int32_neon(int32_t *src, unsigned len, unsigned shift);
 void ff_float_to_fixed24_neon(int32_t *dst, const float *src, unsigned int len);
 void ff_ac3_extract_exponents_neon(uint8_t *exp, int32_t *coef, int nb_coefs);
-void ff_apply_window_int16_neon(int16_t *dst, const int16_t *src,
-                                const int16_t *window, unsigned n);
 void ff_ac3_sum_square_butterfly_int32_neon(int64_t sum[4],
                                             const int32_t *coef0,
                                             const int32_t *coef1,
@@ -61,12 +56,8 @@ av_cold void ff_ac3dsp_init_arm(AC3DSPContext *c, int bit_exact)
 
     if (have_neon(cpu_flags)) {
         c->ac3_exponent_min      = ff_ac3_exponent_min_neon;
-        c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_neon;
-        c->ac3_lshift_int16      = ff_ac3_lshift_int16_neon;
-        c->ac3_rshift_int32      = ff_ac3_rshift_int32_neon;
         c->float_to_fixed24      = ff_float_to_fixed24_neon;
         c->extract_exponents     = ff_ac3_extract_exponents_neon;
-        c->apply_window_int16    = ff_apply_window_int16_neon;
         c->sum_square_butterfly_int32 = ff_ac3_sum_square_butterfly_int32_neon;
         c->sum_square_butterfly_float = ff_ac3_sum_square_butterfly_float_neon;
     }
diff --git a/libavcodec/x86/ac3dsp.asm b/libavcodec/x86/ac3dsp.asm
index 675ade3101..4ddaa94320 100644
--- a/libavcodec/x86/ac3dsp.asm
+++ b/libavcodec/x86/ac3dsp.asm
@@ -35,10 +35,6 @@ pw_bap_mul2: dw 5, 7, 0, 7, 5, 7, 0, 7
 cextern pd_1
 pd_151: times 4 dd 151
 
-; used in ff_apply_window_int16()
-pb_revwords: SHUFFLE_MASK_W 7, 6, 5, 4, 3, 2, 1, 0
-pd_16384: times 4 dd 16384
-
 SECTION .text
 
 ;-----------------------------------------------------------------------------
@@ -81,133 +77,6 @@ AC3_EXPONENT_MIN
 %endif
 %undef LOOP_ALIGN
 
-;-----------------------------------------------------------------------------
-; int ff_ac3_max_msb_abs_int16(const int16_t *src, int len)
-;
-; This function uses 2 different methods to calculate a valid result.
-; 1) logical 'or' of abs of each element
-;        This is used for ssse3 because of the pabsw instruction.
-;        It is also used for mmx because of the lack of min/max instructions.
-; 2) calculate min/max for the array, then or(abs(min),abs(max))
-;        This is used for mmxext and sse2 because they have pminsw/pmaxsw.
-;-----------------------------------------------------------------------------
-
-; logical 'or' of 4 or 8 words in an mmx or xmm register into the low word
-%macro OR_WORDS_HORIZ 2 ; src, tmp
-%if cpuflag(sse2)
-    movhlps     %2, %1
-    por         %1, %2
-    pshuflw     %2, %1, q0032
-    por         %1, %2
-    pshuflw     %2, %1, q0001
-    por         %1, %2
-%elif cpuflag(mmxext)
-    pshufw      %2, %1, q0032
-    por         %1, %2
-    pshufw      %2, %1, q0001
-    por         %1, %2
-%else ; mmx
-    movq        %2, %1
-    psrlq       %2, 32
-    por         %1, %2
-    movq        %2, %1
-    psrlq       %2, 16
-    por         %1, %2
-%endif
-%endmacro
-
-%macro AC3_MAX_MSB_ABS_INT16 1
-cglobal ac3_max_msb_abs_int16, 2,2,5, src, len
-    pxor        m2, m2
-    pxor        m3, m3
-.loop:
-%ifidn %1, min_max
-    mova        m0, [srcq]
-    mova        m1, [srcq+mmsize]
-    pminsw      m2, m0
-    pminsw      m2, m1
-    pmaxsw      m3, m0
-    pmaxsw      m3, m1
-%else ; or_abs
-%if notcpuflag(ssse3)
-    mova        m0, [srcq]
-    mova        m1, [srcq+mmsize]
-    ABS2        m0, m1, m3, m4
-%else ; ssse3
-    ; using memory args is faster for ssse3
-    pabsw       m0, [srcq]
-    pabsw       m1, [srcq+mmsize]
-%endif
-    por         m2, m0
-    por         m2, m1
-%endif
-    add       srcq, mmsize*2
-    sub       lend, mmsize
-    ja .loop
-%ifidn %1, min_max
-    ABS2        m2, m3, m0, m1
-    por         m2, m3
-%endif
-    OR_WORDS_HORIZ m2, m0
-    movd       eax, m2
-    and        eax, 0xFFFF
-    RET
-%endmacro
-
-INIT_MMX mmx
-AC3_MAX_MSB_ABS_INT16 or_abs
-INIT_MMX mmxext
-AC3_MAX_MSB_ABS_INT16 min_max
-INIT_XMM sse2
-AC3_MAX_MSB_ABS_INT16 min_max
-INIT_XMM ssse3
-AC3_MAX_MSB_ABS_INT16 or_abs
-
-;-----------------------------------------------------------------------------
-; macro used for ff_ac3_lshift_int16() and ff_ac3_rshift_int32()
-;-----------------------------------------------------------------------------
-
-%macro AC3_SHIFT 3 ; l/r, 16/32, shift instruction, instruction set
-cglobal ac3_%1shift_int%2, 3, 3, 5, src, len, shift
-    movd      m0, shiftd
-.loop:
-    mova      m1, [srcq         ]
-    mova      m2, [srcq+mmsize  ]
-    mova      m3, [srcq+mmsize*2]
-    mova      m4, [srcq+mmsize*3]
-    %3        m1, m0
-    %3        m2, m0
-    %3        m3, m0
-    %3        m4, m0
-    mova  [srcq         ], m1
-    mova  [srcq+mmsize  ], m2
-    mova  [srcq+mmsize*2], m3
-    mova  [srcq+mmsize*3], m4
-    add     srcq, mmsize*4
-    sub     lend, mmsize*32/%2
-    ja .loop
-.end:
-    REP_RET
-%endmacro
-
-;-----------------------------------------------------------------------------
-; void ff_ac3_lshift_int16(int16_t *src, unsigned int len, unsigned int shift)
-;-----------------------------------------------------------------------------
-
-INIT_MMX mmx
-AC3_SHIFT l, 16, psllw
-INIT_XMM sse2
-AC3_SHIFT l, 16, psllw
-
-;-----------------------------------------------------------------------------
-; void ff_ac3_rshift_int32(int32_t *src, unsigned int len, unsigned int shift)
-;-----------------------------------------------------------------------------
-
-INIT_MMX mmx
-AC3_SHIFT r, 32, psrad
-INIT_XMM sse2
-AC3_SHIFT r, 32, psrad
-
 ;-----------------------------------------------------------------------------
 ; void ff_float_to_fixed24(int32_t *dst, const float *src, unsigned int len)
 ;-----------------------------------------------------------------------------
@@ -423,130 +292,3 @@ AC3_EXTRACT_EXPONENTS
 INIT_XMM ssse3
 AC3_EXTRACT_EXPONENTS
 %endif
-
-;-----------------------------------------------------------------------------
-; void ff_apply_window_int16(int16_t *output, const int16_t *input,
-;                            const int16_t *window, unsigned int len)
-;-----------------------------------------------------------------------------
-
-%macro REVERSE_WORDS 1-2
-%if cpuflag(ssse3) && notcpuflag(atom)
-    pshufb  %1, %2
-%elif cpuflag(sse2)
-    pshuflw  %1, %1, 0x1B
-    pshufhw  %1, %1, 0x1B
-    pshufd   %1, %1, 0x4E
-%elif cpuflag(mmxext)
-    pshufw   %1, %1, 0x1B
-%endif
-%endmacro
-
-%macro MUL16FIXED 3
-%if cpuflag(ssse3) ; dst, src, unused
-; dst = ((dst * src) + (1<<14)) >> 15
-    pmulhrsw   %1, %2
-%elif cpuflag(mmxext) ; dst, src, temp
-; dst = (dst * src) >> 15
-; pmulhw cuts off the bottom bit, so we have to lshift by 1 and add it back
-; in from the pmullw result.
-    mova    %3, %1
-    pmulhw  %1, %2
-    pmullw  %3, %2
-    psrlw   %3, 15
-    psllw   %1, 1
-    por     %1, %3
-%endif
-%endmacro
-
-%macro APPLY_WINDOW_INT16 1 ; %1 bitexact version
-%if %1
-cglobal apply_window_int16, 4,5,6, output, input, window, offset, offset2
-%else
-cglobal apply_window_int16_round, 4,5,6, output, input, window, offset, offset2
-%endif
-    lea     offset2q, [offsetq-mmsize]
-%if cpuflag(ssse3) && notcpuflag(atom)
-    mova          m5, [pb_revwords]
-    ALIGN 16
-%elif %1
-    mova          m5, [pd_16384]
-%endif
-.loop:
-%if cpuflag(ssse3)
-    ; This version does the 16x16->16 multiplication in-place without expanding
-    ; to 32-bit. The ssse3 version is bit-identical.
-    mova          m0, [windowq+offset2q]
-    mova          m1, [ inputq+offset2q]
-    pmulhrsw      m1, m0
-    REVERSE_WORDS m0, m5
-    pmulhrsw      m0, [ inputq+offsetq ]
-    mova  [outputq+offset2q], m1
-    mova  [outputq+offsetq ], m0
-%elif %1
-    ; This version expands 16-bit to 32-bit, multiplies by the window,
-    ; adds 16384 for rounding, right shifts 15, then repacks back to words to
-    ; save to the output. The window is reversed for the second half.
-    mova          m3, [windowq+offset2q]
-    mova          m4, [ inputq+offset2q]
-    pxor          m0, m0
-    punpcklwd     m0, m3
-    punpcklwd     m1, m4
-    pmaddwd       m0, m1
-    paddd         m0, m5
-    psrad         m0, 15
-    pxor          m2, m2
-    punpckhwd     m2, m3
-    punpckhwd     m1, m4
-    pmaddwd       m2, m1
-    paddd         m2, m5
-    psrad         m2, 15
-    packssdw      m0, m2
-    mova  [outputq+offset2q], m0
-    REVERSE_WORDS m3
-    mova          m4, [ inputq+offsetq]
-    pxor          m0, m0
-    punpcklwd     m0, m3
-    punpcklwd     m1, m4
-    pmaddwd       m0, m1
-    paddd         m0, m5
-    psrad         m0, 15
-    pxor          m2, m2
-    punpckhwd     m2, m3
-    punpckhwd     m1, m4
-    pmaddwd       m2, m1
-    paddd         m2, m5
-    psrad         m2, 15
-    packssdw      m0, m2
-    mova  [outputq+offsetq], m0
-%else
-    ; This version does the 16x16->16 multiplication in-place without expanding
-    ; to 32-bit. The mmxext and sse2 versions do not use rounding, and
-    ; therefore are not bit-identical to the C version.
-    mova          m0, [windowq+offset2q]
-    mova          m1, [ inputq+offset2q]
-    mova          m2, [ inputq+offsetq ]
-    MUL16FIXED    m1, m0, m3
-    REVERSE_WORDS m0
-    MUL16FIXED    m2, m0, m3
-    mova  [outputq+offset2q], m1
-    mova  [outputq+offsetq ], m2
-%endif
-    add      offsetd, mmsize
-    sub     offset2d, mmsize
-    jae .loop
-    REP_RET
-%endmacro
-
-INIT_MMX mmxext
-APPLY_WINDOW_INT16 0
-INIT_XMM sse2
-APPLY_WINDOW_INT16 0
-
-INIT_MMX mmxext
-APPLY_WINDOW_INT16 1
-INIT_XMM sse2
-APPLY_WINDOW_INT16 1
-INIT_XMM ssse3
-APPLY_WINDOW_INT16 1
-INIT_XMM ssse3, atom
-APPLY_WINDOW_INT16 1
diff --git a/libavcodec/x86/ac3dsp_init.c b/libavcodec/x86/ac3dsp_init.c
index 2e7e2fb6da..2ae762af46 100644
--- a/libavcodec/x86/ac3dsp_init.c
+++ b/libavcodec/x86/ac3dsp_init.c
@@ -30,17 +30,6 @@ void ff_ac3_exponent_min_mmx   (uint8_t *exp, int num_reuse_blocks, int nb_coefs
 void ff_ac3_exponent_min_mmxext(uint8_t *exp, int num_reuse_blocks, int nb_coefs);
 void ff_ac3_exponent_min_sse2  (uint8_t *exp, int num_reuse_blocks, int nb_coefs);
 
-int ff_ac3_max_msb_abs_int16_mmx  (const int16_t *src, int len);
-int ff_ac3_max_msb_abs_int16_mmxext(const int16_t *src, int len);
-int ff_ac3_max_msb_abs_int16_sse2 (const int16_t *src, int len);
-int ff_ac3_max_msb_abs_int16_ssse3(const int16_t *src, int len);
-
-void ff_ac3_lshift_int16_mmx (int16_t *src, unsigned int len, unsigned int shift);
-void ff_ac3_lshift_int16_sse2(int16_t *src, unsigned int len, unsigned int shift);
-
-void ff_ac3_rshift_int32_mmx (int32_t *src, unsigned int len, unsigned int shift);
-void ff_ac3_rshift_int32_sse2(int32_t *src, unsigned int len, unsigned int shift);
-
 void ff_float_to_fixed24_3dnow(int32_t *dst, const float *src, unsigned int len);
 void ff_float_to_fixed24_sse  (int32_t *dst, const float *src, unsigned int len);
 void ff_float_to_fixed24_sse2 (int32_t *dst, const float *src, unsigned int len);
@@ -50,28 +39,12 @@ int ff_ac3_compute_mantissa_size_sse2(uint16_t mant_cnt[6][16]);
 void ff_ac3_extract_exponents_sse2 (uint8_t *exp, int32_t *coef, int nb_coefs);
 void ff_ac3_extract_exponents_ssse3(uint8_t *exp, int32_t *coef, int nb_coefs);
 
-void ff_apply_window_int16_round_mmxext(int16_t *output, const int16_t *input,
-                                        const int16_t *window, unsigned int len);
-void ff_apply_window_int16_round_sse2(int16_t *output, const int16_t *input,
-                                      const int16_t *window, unsigned int len);
-void ff_apply_window_int16_mmxext(int16_t *output, const int16_t *input,
-                                  const int16_t *window, unsigned int len);
-void ff_apply_window_int16_sse2(int16_t *output, const int16_t *input,
-                                const int16_t *window, unsigned int len);
-void ff_apply_window_int16_ssse3(int16_t *output, const int16_t *input,
-                                 const int16_t *window, unsigned int len);
-void ff_apply_window_int16_ssse3_atom(int16_t *output, const int16_t *input,
-                                      const int16_t *window, unsigned int len);
-
 av_cold void ff_ac3dsp_init_x86(AC3DSPContext *c, int bit_exact)
 {
     int cpu_flags = av_get_cpu_flags();
 
     if (EXTERNAL_MMX(cpu_flags)) {
         c->ac3_exponent_min = ff_ac3_exponent_min_mmx;
-        c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_mmx;
-        c->ac3_lshift_int16 = ff_ac3_lshift_int16_mmx;
-        c->ac3_rshift_int32 = ff_ac3_rshift_int32_mmx;
     }
     if (EXTERNAL_AMD3DNOW(cpu_flags)) {
         if (!bit_exact) {
@@ -80,43 +53,20 @@ av_cold void ff_ac3dsp_init_x86(AC3DSPContext *c, int bit_exact)
     }
     if (EXTERNAL_MMXEXT(cpu_flags)) {
         c->ac3_exponent_min = ff_ac3_exponent_min_mmxext;
-        c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_mmxext;
-        if (bit_exact) {
-            c->apply_window_int16 = ff_apply_window_int16_mmxext;
-        } else {
-            c->apply_window_int16 = ff_apply_window_int16_round_mmxext;
-        }
     }
     if (EXTERNAL_SSE(cpu_flags)) {
         c->float_to_fixed24 = ff_float_to_fixed24_sse;
     }
     if (EXTERNAL_SSE2(cpu_flags)) {
         c->ac3_exponent_min = ff_ac3_exponent_min_sse2;
-        c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_sse2;
         c->float_to_fixed24 = ff_float_to_fixed24_sse2;
         c->compute_mantissa_size = ff_ac3_compute_mantissa_size_sse2;
         c->extract_exponents = ff_ac3_extract_exponents_sse2;
-        if (bit_exact) {
-            c->apply_window_int16 = ff_apply_window_int16_sse2;
-        }
-    }
-
-    if (EXTERNAL_SSE2_FAST(cpu_flags)) {
-        c->ac3_lshift_int16 = ff_ac3_lshift_int16_sse2;
-        c->ac3_rshift_int32 = ff_ac3_rshift_int32_sse2;
-        if (!bit_exact) {
-            c->apply_window_int16 = ff_apply_window_int16_round_sse2;
-        }
     }
 
     if (EXTERNAL_SSSE3(cpu_flags)) {
-        c->ac3_max_msb_abs_int16 = ff_ac3_max_msb_abs_int16_ssse3;
-        if (cpu_flags & AV_CPU_FLAG_ATOM) {
-            c->apply_window_int16 = ff_apply_window_int16_ssse3_atom;
-        } else {
+        if (!(cpu_flags & AV_CPU_FLAG_ATOM))
             c->extract_exponents = ff_ac3_extract_exponents_ssse3;
-            c->apply_window_int16 = ff_apply_window_int16_ssse3;
-        }
     }
 }
 
-- 
2.30.0.rc2

>From 7454b8cad75193744cfdeb29c635c973a654dbc2 Mon Sep 17 00:00:00 2001
From: Lynne <d...@lynne.ee>
Date: Sat, 9 Jan 2021 16:23:20 +0100
Subject: [PATCH v3 5/5] fft: remove 16-bit FFT and MDCT code

No longer used by anything.
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.
---
 libavcodec/Makefile                 |  11 +-
 libavcodec/arm/Makefile             |   9 +-
 libavcodec/arm/fft_fixed_init_arm.c |  50 ------
 libavcodec/arm/fft_fixed_neon.S     | 261 ----------------------------
 libavcodec/arm/mdct_fixed_neon.S    | 193 --------------------
 libavcodec/fft-internal.h           |  29 +---
 libavcodec/fft.h                    |   9 -
 libavcodec/fft_fixed.c              |  21 ---
 libavcodec/fft_template.c           |   4 -
 libavcodec/mdct_fixed.c             |  65 -------
 libavcodec/tests/.gitignore         |   1 -
 libavcodec/tests/fft-fixed.c        |  21 ---
 tests/fate/fft.mak                  |  30 +---
 13 files changed, 14 insertions(+), 690 deletions(-)
 delete mode 100644 libavcodec/arm/fft_fixed_init_arm.c
 delete mode 100644 libavcodec/arm/fft_fixed_neon.S
 delete mode 100644 libavcodec/arm/mdct_fixed_neon.S
 delete mode 100644 libavcodec/fft_fixed.c
 delete mode 100644 libavcodec/mdct_fixed.c
 delete mode 100644 libavcodec/tests/fft-fixed.c

diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 0546e6f6c5..446e6e6b3b 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -83,10 +83,9 @@ OBJS-$(CONFIG_EXIF)                    += exif.o tiff_common.o
 OBJS-$(CONFIG_FAANDCT)                 += faandct.o
 OBJS-$(CONFIG_FAANIDCT)                += faanidct.o
 OBJS-$(CONFIG_FDCTDSP)                 += fdctdsp.o jfdctfst.o jfdctint.o
-FFT-OBJS-$(CONFIG_HARDCODED_TABLES)    += cos_tables.o cos_fixed_tables.o
-OBJS-$(CONFIG_FFT)                     += avfft.o fft_fixed.o fft_float.o \
-                                          fft_fixed_32.o fft_init_table.o \
-                                          $(FFT-OBJS-yes)
+FFT-OBJS-$(CONFIG_HARDCODED_TABLES)    += cos_tables.o
+OBJS-$(CONFIG_FFT)                     += avfft.o fft_float.o fft_fixed_32.o \
+                                          fft_init_table.o $(FFT-OBJS-yes)
 OBJS-$(CONFIG_FLACDSP)                 += flacdsp.o
 OBJS-$(CONFIG_FMTCONVERT)              += fmtconvert.o
 OBJS-$(CONFIG_GOLOMB)                  += golomb.o
@@ -115,7 +114,7 @@ OBJS-$(CONFIG_LLVIDENCDSP)             += lossless_videoencdsp.o
 OBJS-$(CONFIG_LPC)                     += lpc.o
 OBJS-$(CONFIG_LSP)                     += lsp.o
 OBJS-$(CONFIG_LZF)                     += lzf.o
-OBJS-$(CONFIG_MDCT)                    += mdct_fixed.o mdct_float.o mdct_fixed_32.o
+OBJS-$(CONFIG_MDCT)                    += mdct_float.o mdct_fixed_32.o
 OBJS-$(CONFIG_ME_CMP)                  += me_cmp.o
 OBJS-$(CONFIG_MEDIACODEC)              += mediacodecdec_common.o mediacodec_surface.o mediacodec_wrapper.o mediacodec_sw_buffer.o
 OBJS-$(CONFIG_MPEG_ER)                 += mpeg_er.o
@@ -1217,7 +1216,7 @@ TESTPROGS = avpacket                                                    \
 
 TESTPROGS-$(CONFIG_CABAC)                 += cabac
 TESTPROGS-$(CONFIG_DCT)                   += avfft
-TESTPROGS-$(CONFIG_FFT)                   += fft fft-fixed fft-fixed32
+TESTPROGS-$(CONFIG_FFT)                   += fft fft-fixed32
 TESTPROGS-$(CONFIG_GOLOMB)                += golomb
 TESTPROGS-$(CONFIG_IDCTDSP)               += dct
 TESTPROGS-$(CONFIG_IIRFILTER)             += iirfilter
diff --git a/libavcodec/arm/Makefile b/libavcodec/arm/Makefile
index c6be814153..c4ab93aeeb 100644
--- a/libavcodec/arm/Makefile
+++ b/libavcodec/arm/Makefile
@@ -5,8 +5,7 @@ OBJS-$(CONFIG_AC3DSP)                  += arm/ac3dsp_init_arm.o         \
                                           arm/ac3dsp_arm.o
 OBJS-$(CONFIG_AUDIODSP)                += arm/audiodsp_init_arm.o
 OBJS-$(CONFIG_BLOCKDSP)                += arm/blockdsp_init_arm.o
-OBJS-$(CONFIG_FFT)                     += arm/fft_init_arm.o            \
-                                          arm/fft_fixed_init_arm.o
+OBJS-$(CONFIG_FFT)                     += arm/fft_init_arm.o
 OBJS-$(CONFIG_FLACDSP)                 += arm/flacdsp_init_arm.o        \
                                           arm/flacdsp_arm.o
 OBJS-$(CONFIG_FMTCONVERT)              += arm/fmtconvert_init_arm.o
@@ -108,8 +107,7 @@ NEON-OBJS-$(CONFIG_AUDIODSP)           += arm/audiodsp_init_neon.o      \
                                           arm/int_neon.o
 NEON-OBJS-$(CONFIG_BLOCKDSP)           += arm/blockdsp_init_neon.o      \
                                           arm/blockdsp_neon.o
-NEON-OBJS-$(CONFIG_FFT)                += arm/fft_neon.o                \
-                                          arm/fft_fixed_neon.o
+NEON-OBJS-$(CONFIG_FFT)                += arm/fft_neon.o
 NEON-OBJS-$(CONFIG_FMTCONVERT)         += arm/fmtconvert_neon.o
 NEON-OBJS-$(CONFIG_G722DSP)            += arm/g722dsp_neon.o
 NEON-OBJS-$(CONFIG_H264CHROMA)         += arm/h264cmc_neon.o
@@ -123,8 +121,7 @@ NEON-OBJS-$(CONFIG_HPELDSP)            += arm/hpeldsp_init_neon.o       \
 NEON-OBJS-$(CONFIG_IDCTDSP)            += arm/idctdsp_init_neon.o       \
                                           arm/idctdsp_neon.o            \
                                           arm/simple_idct_neon.o
-NEON-OBJS-$(CONFIG_MDCT)               += arm/mdct_neon.o               \
-                                          arm/mdct_fixed_neon.o
+NEON-OBJS-$(CONFIG_MDCT)               += arm/mdct_neon.o
 NEON-OBJS-$(CONFIG_MPEGVIDEO)          += arm/mpegvideo_neon.o
 NEON-OBJS-$(CONFIG_PIXBLOCKDSP)        += arm/pixblockdsp_neon.o
 NEON-OBJS-$(CONFIG_RDFT)               += arm/rdft_neon.o
diff --git a/libavcodec/arm/fft_fixed_init_arm.c b/libavcodec/arm/fft_fixed_init_arm.c
deleted file mode 100644
index 11226d65ff..0000000000
--- a/libavcodec/arm/fft_fixed_init_arm.c
+++ /dev/null
@@ -1,50 +0,0 @@
-/*
- * Copyright (c) 2009 Mans Rullgard <m...@mansr.com>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "libavutil/attributes.h"
-#include "libavutil/cpu.h"
-#include "libavutil/arm/cpu.h"
-
-#define FFT_FLOAT 0
-#include "libavcodec/fft.h"
-
-void ff_fft_fixed_calc_neon(FFTContext *s, FFTComplex *z);
-void ff_mdct_fixed_calc_neon(FFTContext *s, FFTSample *o, const FFTSample *i);
-void ff_mdct_fixed_calcw_neon(FFTContext *s, FFTDouble *o, const FFTSample *i);
-
-av_cold void ff_fft_fixed_init_arm(FFTContext *s)
-{
-    int cpu_flags = av_get_cpu_flags();
-
-    if (have_neon(cpu_flags)) {
-        s->fft_permutation = FF_FFT_PERM_SWAP_LSBS;
-#if CONFIG_FFT
-        s->fft_calc        = ff_fft_fixed_calc_neon;
-#endif
-
-#if CONFIG_MDCT
-        if (!s->inverse && s->nbits >= 3) {
-            s->mdct_permutation = FF_MDCT_PERM_INTERLEAVE;
-            s->mdct_calc        = ff_mdct_fixed_calc_neon;
-            s->mdct_calcw       = ff_mdct_fixed_calcw_neon;
-        }
-#endif
-    }
-}
diff --git a/libavcodec/arm/fft_fixed_neon.S b/libavcodec/arm/fft_fixed_neon.S
deleted file mode 100644
index 2651607544..0000000000
--- a/libavcodec/arm/fft_fixed_neon.S
+++ /dev/null
@@ -1,261 +0,0 @@
-/*
- * Copyright (c) 2011 Mans Rullgard <m...@mansr.com>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "libavutil/arm/asm.S"
-
-.macro  bflies          d0,  d1,  r0,  r1
-        vrev64.32       \r0, \d1                @ t5, t6, t1, t2
-        vhsub.s16       \r1, \d1, \r0           @ t1-t5, t2-t6, t5-t1, t6-t2
-        vhadd.s16       \r0, \d1, \r0           @ t1+t5, t2+t6, t5+t1, t6+t2
-        vext.16         \r1, \r1, \r1, #1       @ t2-t6, t5-t1, t6-t2, t1-t5
-        vtrn.32         \r0, \r1                @ t1+t5, t2+t6, t2-t6, t5-t1
-                                                @ t5,    t6,    t4,    t3
-        vhsub.s16       \d1, \d0, \r0
-        vhadd.s16       \d0, \d0, \r0
-.endm
-
-.macro  transform01     q0,  q1,  d3,  c0,  c1,  r0,  w0,  w1
-        vrev32.16       \r0, \d3
-        vmull.s16       \w0, \d3, \c0
-        vmlal.s16       \w0, \r0, \c1
-        vshrn.s32       \d3, \w0, #15
-        bflies          \q0, \q1, \w0, \w1
-.endm
-
-.macro  transform2      d0,  d1,  d2,  d3,  q0,  q1,  c0,  c1,  c2,  c3, \
-                        r0,  r1,  w0,  w1
-        vrev32.16       \r0, \d1
-        vrev32.16       \r1, \d3
-        vmull.s16       \w0, \d1, \c0
-        vmlal.s16       \w0, \r0, \c1
-        vmull.s16       \w1, \d3, \c2
-        vmlal.s16       \w1, \r1, \c3
-        vshrn.s32       \d1, \w0, #15
-        vshrn.s32       \d3, \w1, #15
-        bflies          \q0, \q1, \w0, \w1
-.endm
-
-.macro  fft4            d0,  d1,  r0,  r1
-        vhsub.s16       \r0, \d0, \d1           @ t3, t4, t8, t7
-        vhsub.s16       \r1, \d1, \d0
-        vhadd.s16       \d0, \d0, \d1           @ t1, t2, t6, t5
-        vmov.i64        \d1, #0xffff00000000
-        vbit            \r0, \r1, \d1
-        vrev64.16       \r1, \r0                @ t7, t8, t4, t3
-        vtrn.32         \r0, \r1                @ t3, t4, t7, t8
-        vtrn.32         \d0, \r0                @ t1, t2, t3, t4, t6, t5, t8, t7
-        vhsub.s16       \d1, \d0, \r0           @ r2, i2, r3, i1
-        vhadd.s16       \d0, \d0, \r0           @ r0, i0, r1, i3
-.endm
-
-.macro  fft8            d0,  d1,  d2,  d3,  q0,  q1,  c0,  c1,  r0,  r1, w0, w1
-        fft4            \d0, \d1, \r0, \r1
-        vtrn.32         \d0, \d1                @ z0, z2, z1, z3
-        vhadd.s16       \r0, \d2, \d3           @ t1, t2, t3, t4
-        vhsub.s16       \d3, \d2, \d3           @ z5, z7
-        vmov            \d2, \r0
-        transform01     \q0, \q1, \d3, \c0, \c1, \r0, \w0, \w1
-.endm
-
-function fft4_neon
-        vld1.16         {d0-d1},  [r0]
-        fft4            d0,  d1,  d2,  d3
-        vst1.16         {d0-d1},  [r0]
-        bx              lr
-endfunc
-
-function fft8_neon
-        vld1.16         {d0-d3},  [r0,:128]
-        movrel          r1,  coefs
-        vld1.16         {d30},    [r1,:64]
-        vdup.16         d31, d30[0]
-        fft8            d0,  d1,  d2,  d3,  q0,  q1,  d31, d30, d20, d21, q8, q9
-        vtrn.32         d0,  d1
-        vtrn.32         d2,  d3
-        vst1.16         {d0-d3},  [r0,:128]
-        bx              lr
-endfunc
-
-function fft16_neon
-        vld1.16         {d0-d3},  [r0,:128]!
-        vld1.16         {d4-d7},  [r0,:128]
-        movrel          r1,  coefs
-        sub             r0,  r0,  #32
-        vld1.16         {d28-d31},[r1,:128]
-        vdup.16         d31, d28[0]
-        fft8            d0,  d1,  d2,  d3,  q0,  q1,  d31, d28, d20, d21, q8, q9
-        vswp            d5,  d6
-        fft4            q2,  q3,  q8,  q9
-        vswp            d5,  d6
-        vtrn.32         q0,  q1             @ z0, z4, z2, z6, z1, z5, z3, z7
-        vtrn.32         q2,  q3             @ z8, z12,z10,z14,z9, z13,z11,z15
-        vswp            d1,  d2
-        vdup.16         d31, d28[0]
-        transform01     q0,  q2,  d5,  d31, d28, d20, q8, q9
-        vdup.16         d26, d29[0]
-        vdup.16         d27, d30[0]
-        transform2      d2,  d6,  d3,  d7,  q1,  q3,  d26, d30, d27, d29, \
-                        d20, d21, q8,  q9
-        vtrn.32         q0,  q1
-        vtrn.32         q2,  q3
-        vst1.16         {d0-d3},  [r0,:128]!
-        vst1.16         {d4-d7},  [r0,:128]
-        bx              lr
-endfunc
-
-function fft_pass_neon
-        push            {r4,lr}
-        movrel          lr,  coefs + 24
-        vld1.16         {d30},    [lr,:64]
-        lsl             r12, r2,  #3
-        vmov            d31, d30
-        add             r3,  r1,  r2,  lsl #2
-        mov             lr,  #-8
-        sub             r3,  r3,  #2
-        mov             r4,  r0
-        vld1.16         {d27[]},  [r3,:16]
-        sub             r3,  r3,  #6
-        vld1.16         {q0},     [r4,:128], r12
-        vld1.16         {q1},     [r4,:128], r12
-        vld1.16         {q2},     [r4,:128], r12
-        vld1.16         {q3},     [r4,:128], r12
-        vld1.16         {d28},    [r1,:64]!
-        vld1.16         {d29},    [r3,:64], lr
-        vswp            d1,  d2
-        vswp            d5,  d6
-        vtrn.32         d0,  d1
-        vtrn.32         d4,  d5
-        vdup.16         d25, d28[1]
-        vmul.s16        d27, d27, d31
-        transform01     q0,  q2,  d5,  d25, d27, d20, q8,  q9
-        b               2f
-1:
-        mov             r4,  r0
-        vdup.16         d26, d29[0]
-        vld1.16         {q0},     [r4,:128], r12
-        vld1.16         {q1},     [r4,:128], r12
-        vld1.16         {q2},     [r4,:128], r12
-        vld1.16         {q3},     [r4,:128], r12
-        vld1.16         {d28},    [r1,:64]!
-        vld1.16         {d29},    [r3,:64], lr
-        vswp            d1,  d2
-        vswp            d5,  d6
-        vtrn.32         d0,  d1
-        vtrn.32         d4,  d5
-        vdup.16         d24, d28[0]
-        vdup.16         d25, d28[1]
-        vdup.16         d27, d29[3]
-        vmul.s16        q13, q13, q15
-        transform2      d0,  d4,  d1,  d5,  q0,  q2,  d24, d26, d25, d27, \
-                        d16, d17, q9,  q10
-2:
-        vtrn.32         d2,  d3
-        vtrn.32         d6,  d7
-        vdup.16         d24, d28[2]
-        vdup.16         d26, d29[2]
-        vdup.16         d25, d28[3]
-        vdup.16         d27, d29[1]
-        vmul.s16        q13, q13, q15
-        transform2      d2,  d6,  d3,  d7,  q1,  q3,  d24, d26, d25, d27, \
-                        d16, d17, q9,  q10
-        vtrn.32         d0,  d1
-        vtrn.32         d2,  d3
-        vtrn.32         d4,  d5
-        vtrn.32         d6,  d7
-        vswp            d1,  d2
-        vswp            d5,  d6
-        mov             r4,  r0
-        vst1.16         {q0},     [r4,:128], r12
-        vst1.16         {q1},     [r4,:128], r12
-        vst1.16         {q2},     [r4,:128], r12
-        vst1.16         {q3},     [r4,:128], r12
-        add             r0,  r0,  #16
-        subs            r2,  r2,  #2
-        bgt             1b
-        pop             {r4,pc}
-endfunc
-
-#define F_SQRT1_2   23170
-#define F_COS_16_1  30274
-#define F_COS_16_3  12540
-
-const   coefs, align=4
-        .short          F_SQRT1_2, -F_SQRT1_2, -F_SQRT1_2,  F_SQRT1_2
-        .short          F_COS_16_1,-F_COS_16_1,-F_COS_16_1, F_COS_16_1
-        .short          F_COS_16_3,-F_COS_16_3,-F_COS_16_3, F_COS_16_3
-        .short          1,         -1,         -1,          1
-endconst
-
-.macro  def_fft n, n2, n4
-function fft\n\()_neon
-        push            {r4, lr}
-        mov             r4,  r0
-        bl              fft\n2\()_neon
-        add             r0,  r4,  #\n4*2*4
-        bl              fft\n4\()_neon
-        add             r0,  r4,  #\n4*3*4
-        bl              fft\n4\()_neon
-        mov             r0,  r4
-        pop             {r4, lr}
-        movrelx         r1,  X(ff_cos_\n\()_fixed)
-        mov             r2,  #\n4/2
-        b               fft_pass_neon
-endfunc
-.endm
-
-        def_fft    32,    16,     8
-        def_fft    64,    32,    16
-        def_fft   128,    64,    32
-        def_fft   256,   128,    64
-        def_fft   512,   256,   128
-        def_fft  1024,   512,   256
-        def_fft  2048,  1024,   512
-        def_fft  4096,  2048,  1024
-        def_fft  8192,  4096,  2048
-        def_fft 16384,  8192,  4096
-        def_fft 32768, 16384,  8192
-        def_fft 65536, 32768, 16384
-
-function ff_fft_fixed_calc_neon, export=1
-        ldr             r2,  [r0]
-        sub             r2,  r2,  #2
-        movrel          r3,  fft_fixed_tab_neon
-        ldr             r3,  [r3, r2, lsl #2]
-        mov             r0,  r1
-        bx              r3
-endfunc
-
-const   fft_fixed_tab_neon, relocate=1
-        .word fft4_neon
-        .word fft8_neon
-        .word fft16_neon
-        .word fft32_neon
-        .word fft64_neon
-        .word fft128_neon
-        .word fft256_neon
-        .word fft512_neon
-        .word fft1024_neon
-        .word fft2048_neon
-        .word fft4096_neon
-        .word fft8192_neon
-        .word fft16384_neon
-        .word fft32768_neon
-        .word fft65536_neon
-endconst
diff --git a/libavcodec/arm/mdct_fixed_neon.S b/libavcodec/arm/mdct_fixed_neon.S
deleted file mode 100644
index 365c5e7faf..0000000000
--- a/libavcodec/arm/mdct_fixed_neon.S
+++ /dev/null
@@ -1,193 +0,0 @@
-/*
- * Copyright (c) 2011 Mans Rullgard <m...@mansr.com>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "libavutil/arm/asm.S"
-
-.macro  prerot          dst, rt
-        lsr             r3,  r6,  #2            @ n4
-        add             \rt, r4,  r6,  lsr #1   @ revtab + n4
-        add             r9,  r3,  r3,  lsl #1   @ n3
-        add             r8,  r7,  r6            @ tcos + n4
-        add             r3,  r2,  r6,  lsr #1   @ in + n4
-        add             r9,  r2,  r9,  lsl #1   @ in + n3
-        sub             r8,  r8,  #16
-        sub             r10, r3,  #16
-        sub             r11, r9,  #16
-        mov             r12, #-16
-1:
-        vld2.16         {d0,d1},  [r9, :128]!
-        vld2.16         {d2,d3},  [r11,:128], r12
-        vld2.16         {d4,d5},  [r3, :128]!
-        vld2.16         {d6,d7},  [r10,:128], r12
-        vld2.16         {d16,d17},[r7, :128]!   @ cos, sin
-        vld2.16         {d18,d19},[r8, :128], r12
-        vrev64.16       q1,  q1
-        vrev64.16       q3,  q3
-        vrev64.16       q9,  q9
-        vneg.s16        d0,  d0
-        vneg.s16        d2,  d2
-        vneg.s16        d16, d16
-        vneg.s16        d18, d18
-        vhsub.s16       d0,  d0,  d3            @ re
-        vhsub.s16       d4,  d7,  d4            @ im
-        vhsub.s16       d6,  d6,  d5
-        vhsub.s16       d2,  d2,  d1
-        vmull.s16       q10, d0,  d16
-        vmlsl.s16       q10, d4,  d17
-        vmull.s16       q11, d0,  d17
-        vmlal.s16       q11, d4,  d16
-        vmull.s16       q12, d6,  d18
-        vmlsl.s16       q12, d2,  d19
-        vmull.s16       q13, d6,  d19
-        vmlal.s16       q13, d2,  d18
-        vshrn.s32       d0,  q10, #15
-        vshrn.s32       d1,  q11, #15
-        vshrn.s32       d2,  q12, #15
-        vshrn.s32       d3,  q13, #15
-        vzip.16         d0,  d1
-        vzip.16         d2,  d3
-        ldrh            lr,  [r4], #2
-        ldrh            r2,  [\rt, #-2]!
-        add             lr,  \dst, lr,  lsl #2
-        add             r2,  \dst, r2,  lsl #2
-        vst1.32         {d0[0]},  [lr,:32]
-        vst1.32         {d2[0]},  [r2,:32]
-        ldrh            lr,  [r4], #2
-        ldrh            r2,  [\rt, #-2]!
-        add             lr,  \dst, lr,  lsl #2
-        add             r2,  \dst, r2,  lsl #2
-        vst1.32         {d0[1]},  [lr,:32]
-        vst1.32         {d2[1]},  [r2,:32]
-        ldrh            lr,  [r4], #2
-        ldrh            r2,  [\rt, #-2]!
-        add             lr,  \dst, lr,  lsl #2
-        add             r2,  \dst, r2,  lsl #2
-        vst1.32         {d1[0]},  [lr,:32]
-        vst1.32         {d3[0]},  [r2,:32]
-        ldrh            lr,  [r4], #2
-        ldrh            r2,  [\rt, #-2]!
-        add             lr,  \dst, lr,  lsl #2
-        add             r2,  \dst, r2,  lsl #2
-        vst1.32         {d1[1]},  [lr,:32]
-        vst1.32         {d3[1]},  [r2,:32]
-        subs            r6,  r6,  #32
-        bgt             1b
-.endm
-
-function ff_mdct_fixed_calc_neon, export=1
-        push            {r1,r4-r11,lr}
-
-        ldr             r4,  [r0, #8]           @ revtab
-        ldr             r6,  [r0, #16]          @ mdct_size; n
-        ldr             r7,  [r0, #24]          @ tcos
-
-        prerot          r1,  r5
-
-        mov             r4,  r0
-        bl              X(ff_fft_fixed_calc_neon)
-
-        pop             {r5}
-        mov             r12, #-16
-        ldr             r6,  [r4, #16]          @ mdct_size; n
-        ldr             r7,  [r4, #24]          @ tcos
-        add             r5,  r5,  r6,  lsr #1
-        add             r7,  r7,  r6,  lsr #1
-        sub             r1,  r5,  #16
-        sub             r2,  r7,  #16
-1:
-        vld2.16         {d4,d5},  [r7,:128]!
-        vld2.16         {d6,d7},  [r2,:128], r12
-        vld2.16         {d0,d1},  [r5,:128]
-        vld2.16         {d2,d3},  [r1,:128]
-        vrev64.16       q3,  q3
-        vrev64.16       q1,  q1
-        vneg.s16        q3,  q3
-        vneg.s16        q2,  q2
-        vmull.s16       q11, d2,  d6
-        vmlal.s16       q11, d3,  d7
-        vmull.s16       q8,  d0,  d5
-        vmlsl.s16       q8,  d1,  d4
-        vmull.s16       q9,  d0,  d4
-        vmlal.s16       q9,  d1,  d5
-        vmull.s16       q10, d2,  d7
-        vmlsl.s16       q10, d3,  d6
-        vshrn.s32       d0,  q11, #15
-        vshrn.s32       d1,  q8,  #15
-        vshrn.s32       d2,  q9,  #15
-        vshrn.s32       d3,  q10, #15
-        vrev64.16       q0,  q0
-        vst2.16         {d2,d3},  [r5,:128]!
-        vst2.16         {d0,d1},  [r1,:128], r12
-        subs            r6,  r6,  #32
-        bgt             1b
-
-        pop             {r4-r11,pc}
-endfunc
-
-function ff_mdct_fixed_calcw_neon, export=1
-        push            {r1,r4-r11,lr}
-
-        ldrd            r4,  r5,  [r0, #8]      @ revtab, tmp_buf
-        ldr             r6,  [r0, #16]          @ mdct_size; n
-        ldr             r7,  [r0, #24]          @ tcos
-
-        prerot          r5,  r1
-
-        mov             r4,  r0
-        mov             r1,  r5
-        bl              X(ff_fft_fixed_calc_neon)
-
-        pop             {r7}
-        mov             r12, #-16
-        ldr             r6,  [r4, #16]          @ mdct_size; n
-        ldr             r9,  [r4, #24]          @ tcos
-        add             r5,  r5,  r6,  lsr #1
-        add             r7,  r7,  r6
-        add             r9,  r9,  r6,  lsr #1
-        sub             r3,  r5,  #16
-        sub             r1,  r7,  #16
-        sub             r2,  r9,  #16
-1:
-        vld2.16         {d4,d5},  [r9,:128]!
-        vld2.16         {d6,d7},  [r2,:128], r12
-        vld2.16         {d0,d1},  [r5,:128]!
-        vld2.16         {d2,d3},  [r3,:128], r12
-        vrev64.16       q3,  q3
-        vrev64.16       q1,  q1
-        vneg.s16        q3,  q3
-        vneg.s16        q2,  q2
-        vmull.s16       q8,  d2,  d6
-        vmlal.s16       q8,  d3,  d7
-        vmull.s16       q9,  d0,  d5
-        vmlsl.s16       q9,  d1,  d4
-        vmull.s16       q10, d0,  d4
-        vmlal.s16       q10, d1,  d5
-        vmull.s16       q11, d2,  d7
-        vmlsl.s16       q11, d3,  d6
-        vrev64.32       q8,  q8
-        vrev64.32       q9,  q9
-        vst2.32         {q10,q11},[r7,:128]!
-        vst2.32         {d16,d18},[r1,:128], r12
-        vst2.32         {d17,d19},[r1,:128], r12
-        subs            r6,  r6,  #32
-        bgt             1b
-
-        pop             {r4-r11,pc}
-endfunc
diff --git a/libavcodec/fft-internal.h b/libavcodec/fft-internal.h
index 0a8f7d05cf..3bd5a1123d 100644
--- a/libavcodec/fft-internal.h
+++ b/libavcodec/fft-internal.h
@@ -34,7 +34,7 @@
         (dim) = (are) * (bim) + (aim) * (bre);  \
     } while (0)
 
-#else
+#else /* FFT_FLOAT */
 
 #define SCALE_FLOAT(a, bits) lrint((a) * (double)(1 << (bits)))
 
@@ -52,33 +52,6 @@
 
 #define FIX15(a) av_clip(SCALE_FLOAT(a, 31), -2147483647, 2147483647)
 
-#else /* FFT_FIXED_32 */
-
-#include "fft.h"
-#include "mathops.h"
-
-void ff_mdct_calcw_c(FFTContext *s, FFTDouble *output, const FFTSample *input);
-
-#define FIX15(a) av_clip(SCALE_FLOAT(a, 15), -32767, 32767)
-
-#define sqrthalf ((int16_t)((1<<15)*M_SQRT1_2))
-
-#define BF(x, y, a, b) do {                     \
-        x = (a - b) >> 1;                       \
-        y = (a + b) >> 1;                       \
-    } while (0)
-
-#define CMULS(dre, dim, are, aim, bre, bim, sh) do {            \
-        (dre) = (MUL16(are, bre) - MUL16(aim, bim)) >> sh;      \
-        (dim) = (MUL16(are, bim) + MUL16(aim, bre)) >> sh;      \
-    } while (0)
-
-#define CMUL(dre, dim, are, aim, bre, bim)      \
-    CMULS(dre, dim, are, aim, bre, bim, 15)
-
-#define CMULL(dre, dim, are, aim, bre, bim)     \
-    CMULS(dre, dim, are, aim, bre, bim, 0)
-
 #endif /* FFT_FIXED_32 */
 
 #endif /* FFT_FLOAT */
diff --git a/libavcodec/fft.h b/libavcodec/fft.h
index 5f67b61f06..5ca2d18432 100644
--- a/libavcodec/fft.h
+++ b/libavcodec/fft.h
@@ -52,12 +52,6 @@ typedef float FFTDouble;
 
 typedef int32_t FFTSample;
 
-#else /* FFT_FIXED_32 */
-
-#define FFT_NAME(x) x ## _fixed
-
-typedef int16_t FFTSample;
-
 #endif /* FFT_FIXED_32 */
 
 typedef struct FFTComplex {
@@ -108,7 +102,6 @@ struct FFTContext {
     void (*imdct_calc)(struct FFTContext *s, FFTSample *output, const FFTSample *input);
     void (*imdct_half)(struct FFTContext *s, FFTSample *output, const FFTSample *input);
     void (*mdct_calc)(struct FFTContext *s, FFTSample *output, const FFTSample *input);
-    void (*mdct_calcw)(struct FFTContext *s, FFTDouble *output, const FFTSample *input);
     enum fft_permutation_type fft_permutation;
     enum mdct_permutation_type mdct_permutation;
     uint32_t *revtab32;
@@ -163,8 +156,6 @@ void ff_fft_init_arm(FFTContext *s);
 void ff_fft_init_mips(FFTContext *s);
 void ff_fft_init_ppc(FFTContext *s);
 
-void ff_fft_fixed_init_arm(FFTContext *s);
-
 void ff_fft_end(FFTContext *s);
 
 #define ff_mdct_init FFT_NAME(ff_mdct_init)
diff --git a/libavcodec/fft_fixed.c b/libavcodec/fft_fixed.c
deleted file mode 100644
index 3d3bd2fca6..0000000000
--- a/libavcodec/fft_fixed.c
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#define FFT_FLOAT 0
-#define FFT_FIXED_32 0
-#include "fft_template.c"
diff --git a/libavcodec/fft_template.c b/libavcodec/fft_template.c
index e807f4b255..2d05990ca9 100644
--- a/libavcodec/fft_template.c
+++ b/libavcodec/fft_template.c
@@ -236,11 +236,7 @@ av_cold int ff_fft_init(FFTContext *s, int nbits, int inverse)
     if (ARCH_ARM)     ff_fft_init_arm(s);
     if (ARCH_PPC)     ff_fft_init_ppc(s);
     if (ARCH_X86)     ff_fft_init_x86(s);
-    if (CONFIG_MDCT)  s->mdct_calcw = s->mdct_calc;
     if (HAVE_MIPSFPU) ff_fft_init_mips(s);
-#else
-    if (CONFIG_MDCT)  s->mdct_calcw = ff_mdct_calcw_c;
-    if (ARCH_ARM)     ff_fft_fixed_init_arm(s);
 #endif
     for(j=4; j<=nbits; j++) {
         ff_init_ff_cos_tabs(j);
diff --git a/libavcodec/mdct_fixed.c b/libavcodec/mdct_fixed.c
deleted file mode 100644
index aabf0c88f8..0000000000
--- a/libavcodec/mdct_fixed.c
+++ /dev/null
@@ -1,65 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#define FFT_FLOAT 0
-#define FFT_FIXED_32 0
-#include "mdct_template.c"
-
-/* same as ff_mdct_calcw_c with double-width unscaled output */
-void ff_mdct_calcw_c(FFTContext *s, FFTDouble *out, const FFTSample *input)
-{
-    int i, j, n, n8, n4, n2, n3;
-    FFTDouble re, im;
-    const uint16_t *revtab = s->revtab;
-    const FFTSample *tcos = s->tcos;
-    const FFTSample *tsin = s->tsin;
-    FFTComplex *x = s->tmp_buf;
-    FFTDComplex *o = (FFTDComplex *)out;
-
-    n = 1 << s->mdct_bits;
-    n2 = n >> 1;
-    n4 = n >> 2;
-    n8 = n >> 3;
-    n3 = 3 * n4;
-
-    /* pre rotation */
-    for(i=0;i<n8;i++) {
-        re = RSCALE(-input[2*i+n3], - input[n3-1-2*i]);
-        im = RSCALE(-input[n4+2*i], + input[n4-1-2*i]);
-        j = revtab[i];
-        CMUL(x[j].re, x[j].im, re, im, -tcos[i], tsin[i]);
-
-        re = RSCALE( input[2*i]   , - input[n2-1-2*i]);
-        im = RSCALE(-input[n2+2*i], - input[ n-1-2*i]);
-        j = revtab[n8 + i];
-        CMUL(x[j].re, x[j].im, re, im, -tcos[n8 + i], tsin[n8 + i]);
-    }
-
-    s->fft_calc(s, x);
-
-    /* post rotation */
-    for(i=0;i<n8;i++) {
-        FFTDouble r0, i0, r1, i1;
-        CMULL(i1, r0, x[n8-i-1].re, x[n8-i-1].im, -tsin[n8-i-1], -tcos[n8-i-1]);
-        CMULL(i0, r1, x[n8+i  ].re, x[n8+i  ].im, -tsin[n8+i  ], -tcos[n8+i  ]);
-        o[n8-i-1].re = r0;
-        o[n8-i-1].im = i0;
-        o[n8+i  ].re = r1;
-        o[n8+i  ].im = i1;
-    }
-}
diff --git a/libavcodec/tests/.gitignore b/libavcodec/tests/.gitignore
index a01a700e2d..dcefc5914c 100644
--- a/libavcodec/tests/.gitignore
+++ b/libavcodec/tests/.gitignore
@@ -5,7 +5,6 @@
 /codec_desc
 /dct
 /fft
-/fft-fixed
 /fft-fixed32
 /golomb
 /h264_levels
diff --git a/libavcodec/tests/fft-fixed.c b/libavcodec/tests/fft-fixed.c
deleted file mode 100644
index 3c50bf1dc1..0000000000
--- a/libavcodec/tests/fft-fixed.c
+++ /dev/null
@@ -1,21 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#define FFT_FLOAT 0
-#define AVFFT 0
-#include "fft.c"
diff --git a/tests/fate/fft.mak b/tests/fate/fft.mak
index 5da6e687ec..76701dcce6 100644
--- a/tests/fate/fft.mak
+++ b/tests/fate/fft.mak
@@ -26,27 +26,7 @@ FATE_FFT_ALL = $(FATE_DCT-yes) $(FATE_FFT-yes) $(FATE_MDCT-yes) $(FATE_RDFT-yes)
 $(FATE_FFT_ALL): libavcodec/tests/fft$(EXESUF)
 $(FATE_FFT_ALL): CMD = run libavcodec/tests/fft$(EXESUF) $(CPUFLAGS:%=-c%) $(ARGS)
 
-define DEF_FFT_FIXED
-FATE_FFT_FIXED-$(CONFIG_FFT)   += fate-fft-fixed-$(1)  fate-ifft-fixed-$(1)
-FATE_MDCT_FIXED-$(CONFIG_MDCT) += fate-mdct-fixed-$(1) fate-imdct-fixed-$(1)
-
-fate-fft-fixed-$(1):   ARGS = -n$(1)
-fate-ifft-fixed-$(1):  ARGS = -n$(1) -i
-fate-mdct-fixed-$(1):  ARGS = -n$(1) -m
-fate-imdct-fixed-$(1): ARGS = -n$(1) -m -i
-endef
-
-$(foreach N, 4 5 6 7 8 9 10 11 12, $(eval $(call DEF_FFT_FIXED,$(N))))
-
-fate-fft-fixed: $(FATE_FFT_FIXED-yes)
-fate-mdct-fixed: $(FATE_MDCT_FIXED-yes)
-
-FATE_FFT_FIXED_ALL = $(FATE_FFT_FIXED-yes) $(FATE_MDCT_FIXED-yes)
-
-$(FATE_FFT_FIXED_ALL): libavcodec/tests/fft-fixed$(EXESUF)
-$(FATE_FFT_FIXED_ALL): CMD = run libavcodec/tests/fft-fixed$(EXESUF) $(CPUFLAGS:%=-c%) $(ARGS)
-
-$(FATE_FFT_ALL) $(FATE_FFT_FIXED_ALL): CMP = null
+$(FATE_FFT_ALL): CMP = null
 
 define DEF_FFT_FIXED32
 FATE_FFT_FIXED32 += fate-fft-fixed32-$(1)   fate-ifft-fixed32-$(1)  \
@@ -95,9 +75,9 @@ $(FATE_AV_FFT_ALL): CMD = run libavcodec/tests/avfft$(EXESUF) $(CPUFLAGS:%=-c%)
 $(FATE_AV_FFT_ALL): CMP = null
 
 fate-dct: fate-dct-float
-fate-fft: fate-fft-float fate-fft-fixed fate-fft-fixed32
-fate-mdct: fate-mdct-float fate-mdct-fixed
+fate-fft: fate-fft-float fate-fft-fixed32
+fate-mdct: fate-mdct-float
 fate-rdft: fate-rdft-float
 
-FATE-$(call ALLYES, AVCODEC FFT MDCT) += $(FATE_FFT_ALL) $(FATE_FFT_FIXED_ALL) $(FATE_FFT_FIXED32) $(FATE_AV_FFT_ALL)
-fate-fft-all: $(FATE_FFT_ALL) $(FATE_FFT_FIXED_ALL) $(FATE_FFT_FIXED32) $(FATE_AV_FFT_ALL)
+FATE-$(call ALLYES, AVCODEC FFT MDCT) += $(FATE_FFT_ALL) $(FATE_FFT_FIXED32) $(FATE_AV_FFT_ALL)
+fate-fft-all: $(FATE_FFT_ALL) $(FATE_FFT_FIXED32) $(FATE_AV_FFT_ALL)
-- 
2.30.0.rc2

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 1/5] ac3enc_fixed: convert to 32-bit sample format

Reply via email to