On Sat, Jan 2, 2016 at 1:10 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Hi, > > On Sat, Jan 2, 2016 at 4:08 PM, Ganesh Ajjanagadde <gajja...@mit.edu> wrote: >> >> On Sat, Jan 2, 2016 at 1:02 PM, Ronald S. Bultje <rsbul...@gmail.com> >> wrote: >> > Hi, >> > >> > On Sat, Jan 2, 2016 at 3:23 PM, Ganesh Ajjanagadde <gajja...@mit.edu> >> > wrote: >> >> >> >> On Fri, Jan 1, 2016 at 8:07 AM, Ganesh Ajjanagadde <gajja...@mit.edu> >> >> wrote: >> >> > Hi all, >> >> > >> >> > Motivated by a remark by Ronald: >> >> > https://ffmpeg.org/pipermail/ffmpeg-devel/2016-January/186200.html, >> >> > this is a request for comment on disabling compile time tablegen for >> >> > cbrt if the total cycle count < 200000. Note that cbrt tables are >> >> > only >> >> > used in aacdec. >> >> >> >> To start some effort towards a more principled understanding of the >> >> costs of runtime table initialization, I did some benchmarks. >> >> Note: I am not familiar with avcodec, so I don't know if this reflects >> >> correctly the static vs dynamic cost. >> >> file: ~/samples/aac/al04_44.mp4 >> >> stream_loop: 100 >> >> number of calls of avcodec_decode_audio4: 35956 >> >> cost per call (avcodec_decode_audio4): >> >> 834030 decicycles in decode_audio4, 1 runs, 0 skips >> >> 556200 decicycles in decode_audio4, 2 runs, 0 skips >> >> [...] >> >> 177365 decicycles in decode_audio4, 16384 runs, 0 skips >> >> 177059 decicycles in decode_audio4, 32768 runs, 0 skips >> >> decoding cost: 17706*35956 = 636,636,936 cycles >> >> duration: 832.55 seconds >> >> cost per second of audio: 764,683 cycles >> >> cost of table init: 200,000 cycles >> >> fraction: 0.26 >> >> >> >> So in a clip of n seconds duration, the relative overhead of dynamic >> >> initialization of these cbrt tables is 0.26/n. For a more concrete >> >> number, say a clip is of 180 seconds duration, then the overhead is >> >> 0.26/180 = 0.15%. >> > >> > >> > What if I only want to play the first 3 second of 1000 clips by calling >> > ffmpeg.exe in a shell script? E.g. for fingerprinting. The number of use >> > cases you cover needs to be more than just playback, ffmpeg can do much >> > more >> > than just that. >> >> Two remarks: >> 1. As I said, this was only a start of the discussion; and the general >> c/t decay holds; constant c should be close to what I obtained. So >> yes, if you have such a thing, it will be slower. >> 2. I thought ffmpeg had the ability to handle multiple input files in >> a single invocation? Thus, someone doing such a thing is IMHO doing it >> incorrectly. > > > ffmpeg has a lot of abilities. But most of our users are not harvard (or MIT > :-) ) PhDs, so they're unlikely to do it in the most optimal way, and very > likely to do it in the easiest way.
I don't deny that, but isn't it also true that most non power users will not discover or use --enable-hardcoded-tables? Maybe now that I have added perf build notes to the wiki: https://trac.ffmpeg.org/wiki/CompilationGuide, some readers of the wiki will try to use it, I can't say. There is also an aspect that the program invocation overhead itself is there, with both os level stuff and FFmpeg's own internal initialization separate from this particular table, or even separate from other table generation steps. It may turn out that the 200,000 cycles is a small fraction of the net startup cost. I deliberately avoided benching this to keep the focus narrow, but if you think it is useful for perspective on this thread, I can add it. > > Ronald _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel