On 5/6/19, Lynne <d...@lynne.ee> wrote: > May 5, 2019, 1:52 PM by d...@lynne.ee: > >> May 4, 2019, 10:00 PM by > d...@lynne.ee <mailto:d...@lynne.ee>> : >> >>> May 4, 2019, 8:10 PM by > >> mich...@niedermayer.cc >>> <mailto:mich...@niedermayer.cc>>> <mailto:>> mich...@niedermayer.cc >>> <mailto:mich...@niedermayer.cc>>> >> : >>> >>>> On Fri, May 03, 2019 at 09:08:57PM +0200, Lynne wrote: >>>> >>>>> This commit adds a new API to libavutil to allow for arbitrary >>>>> transformations >>>>> on various types of data. >>>>> >>>> breaks build on mips >>>> >>>> CC libavutil/fft.o >>>> src/libavutil/fft.c:47: error: redefinition of typedef ‘AVFFTContext’ >>>> src/libavutil/fft.h:25: note: previous declaration of ‘AVFFTContext’ was >>>> here >>>> make: *** [libavutil/fft.o] Error 1 >>>> >>>> [...] >>>> >>> >>> Fixed, v2 attached. Changes: >>> -Stride really is in bytes now. >>> -Corrected some comments (stride supported by all (i)mdcts, not just >>> compound >>> ones, some clarifications regarding the scale). >>> >>> Also that 28-point FFT comparison was a typo, its 128. >>> >> >> Managed to further optimize the 15-point transform by rewriting it as an >> exptab-less >> compound 3x5 transform and embedding its input map into the parent >> transform's map. >> Updated comparisons to libfftw3f: >> 120: >> 22353 decicycles in fftwf_execute, 1024 runs, 0 skips >> 21836 decicycles in compound_fft_15x8, 1024 runs, 0 skips >> >> 480: >> 103998 decicycles in fftwf_execute, 1024 runs, 0 skips >> 102747 decicycles in compound_fft_15x32, 1024 runs, 0 skips >> 960: >> 186210 decicycles in fftwf_execute, 1024 runs, 0 skips >> 215256 decicycles in compound_fft_15x64, 1024 runs, 0 skips >> > > Attached a v4 of the patch which adjusts transform direction by reordering > the > coefficients like the power of two transforms do. This allowed for the > exptabs > to be computed just once on startup and stored in a global array. > Didn't even consider it was possible to do so for odd-sized transforms and > especially for compound 5x3 transforms but after some experimentation I > found > the key was to perform the permutation before the second permutation to > embed the 5x3's input map in. > > I don't think there are any more feasible ways to improve the code, short > of > having 15 different versions for all power of two transforms by hardcoding > the output reindexing, so I'd like to get some feedback on the API. > The old SIMD from lavc is unusable, especially the power of two part, > so it would be nice to get started on rewriting that soon. >
API looks fine to me. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".