On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote: > This gets rid of some branches to speed up table generation slightly > (impact higher on mulaw than alaw). Tables are identical to before, > tested with FATE. > > Sample benchmark (Haswell, GNU/Linux+gcc): > old: > 313494 decicycles in build_alaw_table, 4094 runs, 2 skips > 315959 decicycles in build_alaw_table, 8190 runs, 2 skips > > 323599 decicycles in build_ulaw_table, 4095 runs, 1 skips > 318849 decicycles in build_ulaw_table, 8188 runs, 4 skips > > new: > 261902 decicycles in build_alaw_table, 4096 runs, 0 skips > 266519 decicycles in build_alaw_table, 8192 runs, 0 skips > > 209657 decicycles in build_ulaw_table, 4096 runs, 0 skips > 232656 decicycles in build_ulaw_table, 8192 runs, 0 skips > > Signed-off-by: Ganesh Ajjanagadde <gajjanaga...@gmail.com> > --- > libavcodec/pcm_tablegen.h | 24 ++++++++++++------------ > 1 file changed, 12 insertions(+), 12 deletions(-) > > diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h > index 1387210..7269977 100644 > --- a/libavcodec/pcm_tablegen.h > +++ b/libavcodec/pcm_tablegen.h > @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t > *linear_to_xlaw, > { > int i, j, v, v1, v2; > > - j = 0; > - for(i=0;i<128;i++) { > - if (i != 127) { > - v1 = xlaw2linear(i ^ mask); > - v2 = xlaw2linear((i + 1) ^ mask); > - v = (v1 + v2 + 4) >> 3; > - } else { > - v = 8192; > - } > - for(;j<v;j++) { > + j = 1; > + linear_to_xlaw[8192] = mask; > + for(i=0;i<127;i++) { > + v1 = xlaw2linear(i ^ mask); > + v2 = xlaw2linear((i + 1) ^ mask); > + v = (v1 + v2 + 4) >> 3; > + for(;j<v;j+=1) { > + linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > linear_to_xlaw[8192 + j] = (i ^ mask); > - if (j > 0) > - linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80)); > } > } > + for(;j<8192;j++) { > + linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80)); > + linear_to_xlaw[8192 + j] = (127 ^ mask); > + } > linear_to_xlaw[0] = linear_to_xlaw[1];
i think you can make the tables 8 times smaller the points in the table where values transition seemed to be always a multiple of 8 appart so just adjusting the offset in pcm_encode_frame() would allow decreasing the >> 2 to >> 5 if that works out it would make the table generation 8 times faster reduce memory needed and speed up the code runtime due to lower pressure on L1/L2 caches [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB During times of universal deceit, telling the truth becomes a revolutionary act. -- George Orwell
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel