Re: [FFmpeg-devel] [PATCH] swresample/resample: speed up build_filter

Ganesh Ajjanagadde Wed, 04 Nov 2015 07:17:32 -0800

On Wed, Nov 4, 2015 at 9:44 AM, Michael Niedermayer
<[email protected]> wrote:
> On Wed, Nov 04, 2015 at 01:11:45AM -0500, Ganesh Ajjanagadde wrote:
>> This speeds up build_filter by ~ 50%. This gain should be pretty
>> consistent across all architectures and platforms.
>>
>> Essentially, this relies on a observation that the filters have some
>> even/odd symmetry that may be exploited during the construction of the
>> polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] 
>> are
>> easily derived from [0, 0.5] and expensive reevaluation of function
>> points are unnecessary. This requires some rather annoying even/odd
>> bookkeeping as can be seen from the patch.
>>
>> I vaguely recall from signal processing theory more general symmetries 
>> allowing even greater
>> optimization of the construction. At a high level, "even functions"
>> correspond to 2, and one can imagine variations. Nevertheless, for the sake
>> of some generality and because of existing filters, this is all that is
>> being exploited.
>>
>> Currently, this patch relies on phase_count being even or (trivially) 1,
>> though this is not an inherent limitation to the approach. This
>> assumption is safe as phase_count is 1 << phase_bits, and is hence a
>> power of two. There is no way for user API to set it to a nontrivial odd
>> number. This assumption has been placed as an assert in the code.
>>
>> To repeat, this assumes even symmetry of the filters, which is the most 
>> common
>> way to get generalized linear phase anyway and is true of all currently
>> supported filters.
>>
>> As a side note, accuracy should be identical or perhaps slightly better
>> due to this "forcing" filter symmetries leading to a better phase
>> characteristic. As before, I can't test this claim easily, though it may
>> be of interest.
>>
>> Patch tested with FATE.
>>
>> Sample benchmark (x86-64, Haswell, GNU/Linux):
>>
>> test: swr-resample-dblp-44100-2626
>>
>> new:
>> 527376779 decicycles in build_filter(loop 1000),     256 runs,      0 skips
>> 524361765 decicycles in build_filter(loop 1000),     512 runs,      0 skips
>> 516552574 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
>>
>> old:
>> 974178658 decicycles in build_filter(loop 1000),     256 runs,      0 skips
>> 972794408 decicycles in build_filter(loop 1000),     512 runs,      0 skips
>> 954350046 decicycles in build_filter(loop 1000),    1024 runs,      0 skips
>>
>> Note that lower level optimizations are entirely possible, I focussed on
>> getting the high level semantics correct. In any case, this should
>> provide a good foundation.
>>
>> Signed-off-by: Ganesh Ajjanagadde <[email protected]>
>> ---
>>  libswresample/resample.c | 56 
>> +++++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 48 insertions(+), 8 deletions(-)
>>
>> diff --git a/libswresample/resample.c b/libswresample/resample.c
>> index 235c9a9..d5fc5e7 100644
>> --- a/libswresample/resample.c
>> +++ b/libswresample/resample.c
>> @@ -144,7 +144,7 @@ static int build_filter(ResampleContext *c, void 
>> *filter, double factor, int tap
>>                          int filter_type, int kaiser_beta){
>>      int ph, i;
>>      double x, y, w;
>> -    double *tab = av_malloc_array(tap_count,  sizeof(*tab));
>> +    double *tab = av_malloc_array(tap_count+1,  sizeof(*tab));
>>      const int center= (tap_count-1)/2;
>>
>>      if (!tab)
>> @@ -154,9 +154,10 @@ static int build_filter(ResampleContext *c, void 
>> *filter, double factor, int tap
>>      if (factor > 1.0)
>>          factor = 1.0;
>>
>> -    for(ph=0;ph<phase_count;ph++) {
>> +    av_assert0(phase_count == 1 || phase_count % 2 == 0);
>> +    for(ph = 0; ph <= phase_count / 2; ph++) {
>>          double norm = 0;
>> -        for(i=0;i<tap_count;i++) {
>> +        for(i=0;i<=tap_count;i++) {
>>              x = M_PI * ((double)(i - center) - (double)ph / phase_count) * 
>> factor;
>>              if (x == 0) y = 1.0;
>>              else        y = sin(x) / x;
>> @@ -180,26 +181,65 @@ static int build_filter(ResampleContext *c, void 
>> *filter, double factor, int tap
>>              }
>>
>>              tab[i] = y;
>> -            norm += y;
>> +            if (i < tap_count)
>> +                norm += y;
>>          }
>>
>>          /* normalize so that an uniform color remains the same */
>>          switch(c->format){
>>          case AV_SAMPLE_FMT_S16P:
>> -            for(i=0;i<tap_count;i++)
>> +            for(i=0;i<tap_count;i++) {
>>                  ((int16_t*)filter)[ph * alloc + i] = av_clip(lrintf(tab[i] 
>> * scale / norm), INT16_MIN, INT16_MAX);
>> +            }
>> +            if (tap_count % 2 == 0) {
>> +                for (i = 0; i < tap_count; i++)
>> +                    ((int16_t*)filter)[(phase_count-ph) * alloc + 
>> tap_count-1-i] = ((int16_t*)filter)[ph * alloc + i];
>> +            }
>> +            else {
>> +                for (i = 1; i <= tap_count; i++)
>> +                    ((int16_t*)filter)[(phase_count-ph) * alloc + 
>> tap_count-i] =
>> +                        av_clip(lrintf(tab[i] * scale / (norm - tab[0] + 
>> tab[tap_count])), INT16_MIN, INT16_MAX);
>> +            }
>>              break;
>>          case AV_SAMPLE_FMT_S32P:
>
>> -            for(i=0;i<tap_count;i++)
>> +            for(i=0;i<tap_count;i++) {
>>                  ((int32_t*)filter)[ph * alloc + i] = 
>> av_clipl_int32(llrint(tab[i] * scale / norm));
>> +            }
>
> this and similar changes look uneeded
>
> otherwise LGTM


Artifacts left over from testing, sorry. Will push later to make sure
there are no further comments on this. I am particularly interested in
comments from API users: does this speed matter to them? Clement
seemed to care about it, but it seems to me quite rare to call this
multiple times on a stream.

>
> [...]
>
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Freedom in capitalist society always remains about the same as it was in
> ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
>
> _______________________________________________
> ffmpeg-devel mailing list
> [email protected]
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swresample/resample: speed up build_filter

Reply via email to