On Mon, Jan 4, 2016 at 2:45 AM, Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Sun, Jan 03, 2016 at 09:11:28PM -0800, Ganesh Ajjanagadde wrote: >> On Sun, Jan 3, 2016 at 7:32 PM, Michael Niedermayer >> <mich...@niedermayer.cc> wrote: >> > On Mon, Jan 04, 2016 at 04:04:02AM +0100, Michael Niedermayer wrote: >> >> On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: >> >> > This gets rid of some branches to speed up table generation slightly >> >> > (impact higher on mulaw than alaw). Tables are identical to before, >> >> > tested with FATE. >> >> > >> >> > Sample benchmark (Haswell, GNU/Linux+gcc): >> >> > old: >> >> > 313494 decicycles in build_alaw_table, 4094 runs, 2 skips >> >> > 315959 decicycles in build_alaw_table, 8190 runs, 2 skips >> >> > >> >> > 323599 decicycles in build_ulaw_table, 4095 runs, 1 skips >> >> > 318849 decicycles in build_ulaw_table, 8188 runs, 4 skips >> >> > >> >> > new: >> >> > 261902 decicycles in build_alaw_table, 4096 runs, 0 skips >> >> > 266519 decicycles in build_alaw_table, 8192 runs, 0 skips >> >> > >> >> > 209657 decicycles in build_ulaw_table, 4096 runs, 0 skips >> >> > 232656 decicycles in build_ulaw_table, 8192 runs, 0 skips >> >> > >> >> > Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> >> >> > --- >> >> > libavcodec/pcm_tablegen.h | 24 ++++++++++++------------ >> >> > 1 file changed, 12 insertions(+), 12 deletions(-) >> >> > >> >> > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h >> >> > index 1387210..7269977 100644 >> >> > --- a/libavcodec/pcm_tablegen.h >> >> > +++ b/libavcodec/pcm_tablegen.h >> >> > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t >> >> > *linear_to_xlaw, >> >> > { >> >> > int i, j, v, v1, v2; >> >> > >> >> > - j = 0; >> >> > - for(i=0;i<128;i++) { >> >> > - if (i != 127) { >> >> > - v1 = xlaw2linear(i ^ mask); >> >> > - v2 = xlaw2linear((i + 1) ^ mask); >> >> > - v = (v1 + v2 + 4) >> 3; >> >> > - } else { >> >> > - v = 8192; >> >> > - } >> >> > - for(;j<v;j++) { >> >> > + j = 1; >> >> > + linear_to_xlaw[8192] = mask; >> >> > + for(i=0;i<127;i++) { >> >> > + v1 = xlaw2linear(i ^ mask); >> >> > + v2 = xlaw2linear((i + 1) ^ mask); >> >> > + v = (v1 + v2 + 4) >> 3; >> >> > + for(;j<v;j+=1) { >> >> > + linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> >> > linear_to_xlaw[8192 + j] = (i ^ mask); >> >> > - if (j > 0) >> >> > - linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); >> >> > } >> >> > } >> >> > + for(;j<8192;j++) { >> >> > + linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); >> >> > + linear_to_xlaw[8192 + j] = (127 ^ mask); >> >> > + } >> >> > linear_to_xlaw[0] = linear_to_xlaw[1]; >> >> >> >> i think you can make the tables 8 times smaller >> > >> > forget this, i should have checked the whole table or looked when i >> > am awake ... >> >> ha ha. By the way, both changes are needed to get this level of >> speedup, with only the j change which you acked, the speedup is much >> smaller. But then also note that the other parts of the patch also >> increase the binary size more. > > hmm, ok if its needed to get the speedup then LGTM > > thanks
pushed, thanks > > [...] > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Concerning the gods, I have no means of knowing whether they exist or not > or of what sort they may be, because of the obscurity of the subject, and > the brevity of human life -- Protagoras > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel