On Wed, Nov 04, 2015 at 01:11:45AM -0500, Ganesh Ajjanagadde wrote: > This speeds up build_filter by ~ 50%. This gain should be pretty > consistent across all architectures and platforms. > > Essentially, this relies on a observation that the filters have some > even/odd symmetry that may be exploited during the construction of the > polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] > are > easily derived from [0, 0.5] and expensive reevaluation of function > points are unnecessary. This requires some rather annoying even/odd > bookkeeping as can be seen from the patch. > > I vaguely recall from signal processing theory more general symmetries > allowing even greater > optimization of the construction. At a high level, "even functions" > correspond to 2, and one can imagine variations. Nevertheless, for the sake > of some generality and because of existing filters, this is all that is > being exploited. > > Currently, this patch relies on phase_count being even or (trivially) 1, > though this is not an inherent limitation to the approach. This > assumption is safe as phase_count is 1 << phase_bits, and is hence a > power of two. There is no way for user API to set it to a nontrivial odd > number. This assumption has been placed as an assert in the code. > > To repeat, this assumes even symmetry of the filters, which is the most common > way to get generalized linear phase anyway and is true of all currently > supported filters. > > As a side note, accuracy should be identical or perhaps slightly better > due to this "forcing" filter symmetries leading to a better phase > characteristic. As before, I can't test this claim easily, though it may > be of interest. > > Patch tested with FATE. > > Sample benchmark (x86-64, Haswell, GNU/Linux): > > test: swr-resample-dblp-44100-2626 > > new: > 527376779 decicycles in build_filter(loop 1000), 256 runs, 0 skips > 524361765 decicycles in build_filter(loop 1000), 512 runs, 0 skips > 516552574 decicycles in build_filter(loop 1000), 1024 runs, 0 skips > > old: > 974178658 decicycles in build_filter(loop 1000), 256 runs, 0 skips > 972794408 decicycles in build_filter(loop 1000), 512 runs, 0 skips > 954350046 decicycles in build_filter(loop 1000), 1024 runs, 0 skips > > Note that lower level optimizations are entirely possible, I focussed on > getting the high level semantics correct. In any case, this should > provide a good foundation. > > Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > --- > libswresample/resample.c | 56 > +++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 48 insertions(+), 8 deletions(-) > > diff --git a/libswresample/resample.c b/libswresample/resample.c > index 235c9a9..d5fc5e7 100644 > --- a/libswresample/resample.c > +++ b/libswresample/resample.c > @@ -144,7 +144,7 @@ static int build_filter(ResampleContext *c, void *filter, > double factor, int tap > int filter_type, int kaiser_beta){ > int ph, i; > double x, y, w; > - double *tab = av_malloc_array(tap_count, sizeof(*tab)); > + double *tab = av_malloc_array(tap_count+1, sizeof(*tab)); > const int center= (tap_count-1)/2; > > if (!tab) > @@ -154,9 +154,10 @@ static int build_filter(ResampleContext *c, void > *filter, double factor, int tap > if (factor > 1.0) > factor = 1.0; > > - for(ph=0;ph<phase_count;ph++) { > + av_assert0(phase_count == 1 || phase_count % 2 == 0); > + for(ph = 0; ph <= phase_count / 2; ph++) { > double norm = 0; > - for(i=0;i<tap_count;i++) { > + for(i=0;i<=tap_count;i++) { > x = M_PI * ((double)(i - center) - (double)ph / phase_count) * > factor; > if (x == 0) y = 1.0; > else y = sin(x) / x; > @@ -180,26 +181,65 @@ static int build_filter(ResampleContext *c, void > *filter, double factor, int tap > } > > tab[i] = y; > - norm += y; > + if (i < tap_count) > + norm += y; > } > > /* normalize so that an uniform color remains the same */ > switch(c->format){ > case AV_SAMPLE_FMT_S16P: > - for(i=0;i<tap_count;i++) > + for(i=0;i<tap_count;i++) { > ((int16_t*)filter)[ph * alloc + i] = av_clip(lrintf(tab[i] * > scale / norm), INT16_MIN, INT16_MAX); > + } > + if (tap_count % 2 == 0) { > + for (i = 0; i < tap_count; i++) > + ((int16_t*)filter)[(phase_count-ph) * alloc + > tap_count-1-i] = ((int16_t*)filter)[ph * alloc + i]; > + } > + else { > + for (i = 1; i <= tap_count; i++) > + ((int16_t*)filter)[(phase_count-ph) * alloc + > tap_count-i] = > + av_clip(lrintf(tab[i] * scale / (norm - tab[0] + > tab[tap_count])), INT16_MIN, INT16_MAX); > + } > break; > case AV_SAMPLE_FMT_S32P:
> - for(i=0;i<tap_count;i++) > + for(i=0;i<tap_count;i++) { > ((int32_t*)filter)[ph * alloc + i] = > av_clipl_int32(llrint(tab[i] * scale / norm)); > + } this and similar changes look uneeded otherwise LGTM [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Freedom in capitalist society always remains about the same as it was in ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel