Feb 21, 2021, 00:43 by d...@lynne.ee: > Feb 10, 2021, 21:31 by d...@lynne.ee: > >> Feb 10, 2021, 18:15 by d...@lynne.ee: >> >>> This commit adds support for in-place FFT transforms. Since our >>> internal transforms were all in-place anyway, this only changes >>> the permutation on the input. >>> >>> Unfortunately, research papers were of no help here. All focused >>> on dry hardware implementations, where permutes are free, or on >>> software implementations where binary bloat is of no concern so >>> storing dozen times the transforms for each permutation and version >>> is not considered bad practice. >>> Still, for a pure C implementation, it's only around 28% slower >>> than the multi-megabyte FFTW3 in unaligned mode. >>> >>> Unlike a closed permutation like with PFA, split-radix FFT bit-reversals >>> contain multiple NOPs, multiple simple swaps, and a few chained swaps, >>> so regular single-loop single-state permute loops were not possible. >>> Instead, we filter out parts of the input indices which are redundant. >>> This allows for a single branch, and with some clever AVX512 asm, >>> could possibly be SIMD'd without refactoring. >>> >>> The inplace_idx array is guaranteed to never be larger than the >>> revtab array, and in practice only requires around log2(len) entries. >>> >>> The power-of-two MDCTs can be done in-place as well. And it's >>> possible to eliminate a copy in the compound MDCTs too, however >>> it'll be slower than doing them out of place, and we'd need to dirty >>> the input array. >>> >>> Patch attached. >>> >> >> Locally added APIchanges and lavu minor bump. >> And got rid of the unused set temporary variables when permuting. >> > > Will push this tomorrow if there are no objections. >
Pushed. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".