When assembling slices in cbs_mpeg2, cbs_h264 and cbs_h265, a combination of a bitreader and bitwriter is used to copy the data (after the slice header has been assembled). They copy in blocks of 16 bits.
This is inefficient: E.g. the bitreader can be eliminated by first copying bits until the input is byte-aligned. If then the bitwriter is also byte-aligned after this, one can use memcpy to improve performance (I got a more than 20x speed increase for copying the slice data if the slices are big enough and properly aligned). If it is not byte-aligned, one has nevertheless eliminated the shifting done in the bitreader. Shifting 32 bits at once also proved advantageous. The aligned case is very common: For MPEG2, the slice header doesn't contain lots of interesting fields to modify (e.g. the extra_information_slice is reserved), so that there is not really a point in changing the slice at all. (One could actually speed the mpeg2_metadata filter further up by not decomposing slices at all.) For H.264 CABAC mode and H.265, the slice header is always byte-aligned, so that one would have to intentionally produce misaligned data to have it. My patch aims to create the identical output as the current version, with one exception: Currently, cbs_h264 and cbs_h265 assert that the last few bits of input aren't zero as they are supposed to contain the rbsp_stop_one_bit. This is probably done because the behaviour of ff_ctz is undefined when its argument is zero. My version doesn't check for this in the aligned mode, as ff_ctz isn't required here at all. And in the unaligned mode, my version only checks the last 8 bits, whereas the current version checks between 8 and 23 bits. This is my first contribution on this mailing list and I tried my best to follow your patch submission checklist. But I could not check fate, although I tested my patch with several files and they created the same output as the current version. Andreas Rheinhardt (3): cbs_mpeg2: Improve performance of writing slices cbs_h264: Improve performance of writing slices cbs_h265: Improve performance of writing slices libavcodec/cbs_h2645.c | 139 ++++++++++++++++++++++++++++------------- libavcodec/cbs_mpeg2.c | 39 ++++++++---- 2 files changed, 122 insertions(+), 56 deletions(-) -- 2.19.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel