On 5/6/19, Lynne <d...@lynne.ee> wrote:
> May 5, 2019, 1:52 PM by d...@lynne.ee:
>
>> May 4, 2019, 10:00 PM by > d...@lynne.ee <mailto:d...@lynne.ee>> :
>>
>>> May 4, 2019, 8:10 PM by > >> mich...@niedermayer.cc
>>> <mailto:mich...@niedermayer.cc>>>  <mailto:>> mich...@niedermayer.cc
>>> <mailto:mich...@niedermayer.cc>>> >> :
>>>
>>>> On Fri, May 03, 2019 at 09:08:57PM +0200, Lynne wrote:
>>>>
>>>>> This commit adds a new API to libavutil to allow for arbitrary
>>>>> transformations
>>>>> on various types of data.
>>>>>
>>>> breaks build on mips
>>>>
>>>> CC libavutil/fft.o
>>>> src/libavutil/fft.c:47: error: redefinition of typedef ‘AVFFTContext’
>>>> src/libavutil/fft.h:25: note: previous declaration of ‘AVFFTContext’ was
>>>> here
>>>> make: *** [libavutil/fft.o] Error 1
>>>>
>>>> [...]
>>>>
>>>
>>> Fixed, v2 attached. Changes:
>>> -Stride really is in bytes now.
>>> -Corrected some comments (stride supported by all (i)mdcts, not just
>>> compound
>>>  ones, some clarifications regarding the scale).
>>>
>>> Also that 28-point FFT comparison was a typo, its 128.
>>>
>>
>> Managed to further optimize the 15-point transform by rewriting it as an
>> exptab-less
>> compound 3x5 transform and embedding its input map into the parent
>> transform's map.
>> Updated comparisons to libfftw3f:
>> 120:
>>   22353 decicycles in     fftwf_execute,     1024 runs,      0 skips
>>   21836 decicycles in compound_fft_15x8,     1024 runs,      0 skips
>>
>> 480:
>>   103998 decicycles in       fftwf_execute,    1024 runs,      0 skips
>>   102747 decicycles in compound_fft_15x32,    1024 runs,      0 skips
>> 960:
>>   186210 decicycles in      fftwf_execute,    1024 runs,      0 skips
>>   215256 decicycles in compound_fft_15x64,    1024 runs,      0 skips
>>
>
> Attached a v4 of the patch which adjusts transform direction by reordering
> the
> coefficients like the power of two transforms do. This allowed for the
> exptabs
> to be computed just once on startup and stored in a global array.
> Didn't even consider it was possible to do so for odd-sized transforms and
> especially for compound 5x3 transforms but after some experimentation I
> found
> the key was to perform the permutation before the second permutation to
> embed the 5x3's input map in.
>
> I don't think there are any more feasible ways to improve the code, short
> of
> having 15 different versions for all power of two transforms by hardcoding
> the output reindexing, so I'd like to get some feedback on the API.
> The old SIMD from lavc is unusable, especially the power of two part,
> so it would be nice to get started on rewriting that soon.
>

API looks fine to me.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to