On Wed, Nov 4, 2015 at 10:07 AM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: > On Wed, Nov 4, 2015 at 9:44 AM, Michael Niedermayer > <mich...@niedermayer.cc> wrote: >> On Wed, Nov 04, 2015 at 01:11:45AM -0500, Ganesh Ajjanagadde wrote: >>> This speeds up build_filter by ~ 50%. This gain should be pretty >>> consistent across all architectures and platforms. >>> >>> Essentially, this relies on a observation that the filters have some >>> even/odd symmetry that may be exploited during the construction of the >>> polyphase filter bank. In particular, phases (scaled to [0, 1]) in [0.5, 1] >>> are >>> easily derived from [0, 0.5] and expensive reevaluation of function >>> points are unnecessary. This requires some rather annoying even/odd >>> bookkeeping as can be seen from the patch. >>> >>> I vaguely recall from signal processing theory more general symmetries >>> allowing even greater >>> optimization of the construction. At a high level, "even functions" >>> correspond to 2, and one can imagine variations. Nevertheless, for the sake >>> of some generality and because of existing filters, this is all that is >>> being exploited. >>> >>> Currently, this patch relies on phase_count being even or (trivially) 1, >>> though this is not an inherent limitation to the approach. This >>> assumption is safe as phase_count is 1 << phase_bits, and is hence a >>> power of two. There is no way for user API to set it to a nontrivial odd >>> number. This assumption has been placed as an assert in the code. >>> >>> To repeat, this assumes even symmetry of the filters, which is the most >>> common >>> way to get generalized linear phase anyway and is true of all currently >>> supported filters. >>> >>> As a side note, accuracy should be identical or perhaps slightly better >>> due to this "forcing" filter symmetries leading to a better phase >>> characteristic. As before, I can't test this claim easily, though it may >>> be of interest. >>> >>> Patch tested with FATE. >>> >>> Sample benchmark (x86-64, Haswell, GNU/Linux): >>> >>> test: swr-resample-dblp-44100-2626 >>> >>> new: >>> 527376779 decicycles in build_filter(loop 1000), 256 runs, 0 skips >>> 524361765 decicycles in build_filter(loop 1000), 512 runs, 0 skips >>> 516552574 decicycles in build_filter(loop 1000), 1024 runs, 0 skips >>> >>> old: >>> 974178658 decicycles in build_filter(loop 1000), 256 runs, 0 skips >>> 972794408 decicycles in build_filter(loop 1000), 512 runs, 0 skips >>> 954350046 decicycles in build_filter(loop 1000), 1024 runs, 0 skips >>> >>> Note that lower level optimizations are entirely possible, I focussed on >>> getting the high level semantics correct. In any case, this should >>> provide a good foundation. >>> >>> Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >>> --- >>> libswresample/resample.c | 56 >>> +++++++++++++++++++++++++++++++++++++++++------- >>> 1 file changed, 48 insertions(+), 8 deletions(-) >>> >>> diff --git a/libswresample/resample.c b/libswresample/resample.c >>> index 235c9a9..d5fc5e7 100644 >>> --- a/libswresample/resample.c >>> +++ b/libswresample/resample.c >>> @@ -144,7 +144,7 @@ static int build_filter(ResampleContext *c, void >>> *filter, double factor, int tap >>> int filter_type, int kaiser_beta){ >>> int ph, i; >>> double x, y, w; >>> - double *tab = av_malloc_array(tap_count, sizeof(*tab)); >>> + double *tab = av_malloc_array(tap_count+1, sizeof(*tab)); >>> const int center= (tap_count-1)/2; >>> >>> if (!tab) >>> @@ -154,9 +154,10 @@ static int build_filter(ResampleContext *c, void >>> *filter, double factor, int tap >>> if (factor > 1.0) >>> factor = 1.0; >>> >>> - for(ph=0;ph<phase_count;ph++) { >>> + av_assert0(phase_count == 1 || phase_count % 2 == 0); >>> + for(ph = 0; ph <= phase_count / 2; ph++) { >>> double norm = 0; >>> - for(i=0;i<tap_count;i++) { >>> + for(i=0;i<=tap_count;i++) { >>> x = M_PI * ((double)(i - center) - (double)ph / phase_count) * >>> factor; >>> if (x == 0) y = 1.0; >>> else y = sin(x) / x; >>> @@ -180,26 +181,65 @@ static int build_filter(ResampleContext *c, void >>> *filter, double factor, int tap >>> } >>> >>> tab[i] = y; >>> - norm += y; >>> + if (i < tap_count) >>> + norm += y; >>> } >>> >>> /* normalize so that an uniform color remains the same */ >>> switch(c->format){ >>> case AV_SAMPLE_FMT_S16P: >>> - for(i=0;i<tap_count;i++) >>> + for(i=0;i<tap_count;i++) { >>> ((int16_t*)filter)[ph * alloc + i] = av_clip(lrintf(tab[i] >>> * scale / norm), INT16_MIN, INT16_MAX); >>> + } >>> + if (tap_count % 2 == 0) { >>> + for (i = 0; i < tap_count; i++) >>> + ((int16_t*)filter)[(phase_count-ph) * alloc + >>> tap_count-1-i] = ((int16_t*)filter)[ph * alloc + i]; >>> + } >>> + else { >>> + for (i = 1; i <= tap_count; i++) >>> + ((int16_t*)filter)[(phase_count-ph) * alloc + >>> tap_count-i] = >>> + av_clip(lrintf(tab[i] * scale / (norm - tab[0] + >>> tab[tap_count])), INT16_MIN, INT16_MAX); >>> + } >>> break; >>> case AV_SAMPLE_FMT_S32P: >> >>> - for(i=0;i<tap_count;i++) >>> + for(i=0;i<tap_count;i++) { >>> ((int32_t*)filter)[ph * alloc + i] = >>> av_clipl_int32(llrint(tab[i] * scale / norm)); >>> + } >> >> this and similar changes look uneeded >> >> otherwise LGTM > > Artifacts left over from testing, sorry. Will push later to make sure > there are no further comments on this. I am particularly interested in > comments from API users: does this speed matter to them? Clement > seemed to care about it, but it seems to me quite rare to call this > multiple times on a stream.
Put the "by 50%" into the top line of the commit message as well for easy reference. Fixed extraneous braces, and pushed. Thanks. > >> >> [...] >> >> -- >> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB >> >> Freedom in capitalist society always remains about the same as it was in >> ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin >> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel