On Tue, 4 Aug 2015 01:46:24 +0200 Michael Niedermayer <mich...@niedermayer.cc> wrote:
> On Tue, Aug 04, 2015 at 12:57:38AM +0200, wm4 wrote: > > On Tue, 4 Aug 2015 00:51:17 +0200 > > Michael Niedermayer <mich...@niedermayer.cc> wrote: > > > > > On Tue, Aug 04, 2015 at 12:12:24AM +0200, wm4 wrote: > > > > > > > As well as benchmark results in some commits like > > > > > 4302a92835313000d4bf8030baae32c2b130d8a1 > > > > > > > > avcodec/pcm: Better min_size for ff_alloc_packet2() > > > > > > > > 33318 -> 30601 decicycles > > > > > > > > That doesn't look convincing at all. So you save ~3000 decicycles for > > > > a task that isn't CPU bound, but which might blocks entire milliseconds > > > > on I/O. > > > > > > There was a bug report of a overall (aka time ffmpeg) speedloss > > > in the % values with high res rawvideo > > > > Was it fixed? > > for rawideo and for several others too yes I see no proof though. The numbers you posted so far are not really convincing. They're micro-benchmarks too, and might not even matter in real world work loads. > iam pretty sure there are other encoders that suffer from speedloss > due to this too though What is "this"? > > > > > > > > i do intend to fine tune this for more codecs but i spend more time > > > > > in such disussions than working on the code and having people tell me > > > > > to do it differently each time i take a step kind of slows it down > > > > > and turns it into a mess > > > > > > > > We also must take care of not turning the codebase into a mess. > > > > > > I want to fix the mess > > > > I don't mind, but... > > > > > and i would suggest that we first test if any / which encoders benefit > > > from the static buffer. > > > It where quite a few long ago but it seems from a few quick tests that > > > this is not so anymore. > > > Can someone confirm ? > > > if it has become useless then I could drop quite a bit of code > > > > Do you have to do your experiments on the live codebase? If you just > > want to try something, you could do that in a branch. If it gives good > > results, merge, if not, it's a failed experiment. > > noone did "experiments on the live codebase" Then what's that "FIXME" in the heuristic, and the new API parameter that by now uses only 2 possible values, which make this "heuristic" completely pointless? What bugs me is that you insist on this (IMHO) bogus min_size parameter, and all complaints are supposed to be silenced with the holy "speedloss" justification. Except that so far it seems to be rather minor, and I bet shows up only in non-realistic benchmarks. But it serves as justification anyway, because if it's just a slightly bit faster, it wins, no discussion. Now I don't claim I can do better (actually I do, so feel free to call me an idiot), but here are my own (untested) conclusions. From what I see there are 2 cases: - Fixed allocations, usually the case with raw encoders. Currently, the packet is always a full malloc (probably ends up as a mmap syscall if the size is big), but an additional copy of the packet at a later point is avoided when ref-ing or dup-ing the packet. - Variable allocations, where the encoder has a (usually rather high) worst case estimation and wants a buffer of this size. Later the buffer is trimmed to avoid wasting memory. Currently, this is done by encoding to a "static" buffer, and then copying it to a new correctly sized buffer when the packet is referenced or dup-ed. I suppose the min_size parameter is supposed to select between these cases dynamically and use the "best" method. I suggest doing the following instead: - have a buffer pool in the codec context - if a packet is allocated with a size larger then the current pool size, realloc the buffer pool to that larger size - likewise, if the requested packet is much smaller than the current pool size, you could either resize the pool, or just allocate the packet normally (apply clever heuristic 1 here) - the encoder gets a refcounted packet from the buffer pool - after encoding, check how much memory is "wasted" - if too much is wasted (apply clever heuristic 2 here), create a new copy (this could even use another buffer pool which might help if the encoded packet sizes are usually similar, but maybe it's not worth the trouble) This actually has some potential to be faster (like when the encoder does fixed allocations, e.g. raw video or audio, no large allocations are done after a while), and is simpler at least from the encoders point of view. There can be a single ff_alloc_packet function which takes a size only. (But I'd just avoid the need for too many heuristics, and have a separate function for raw encoders. It would allocate from a buffer pool.) Also, why is this all not applied to demuxing? In fact, even when encoding, a lot of data is copied in avpacket.c, and these were packets from the demuxer. Doesn't matter? > also i strongly suggest that this discussion ends now and that we > return to do usefull work. I don't agree that it's better to avoid the search for better solutions. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel