The aim of this patchset is to avoid the memcpy that currently happens in cbs_h2645_fragment_add_nals during the reading stage of cbs_h2645. This is done by taking advantage of the way ff_h2645_packet_split (and internally ff_h2645_extract_rbsp) works: If the NAL initially didn't contain any 0x03 escapes, then no copying is performed and the data and raw_data pointers of the returned NAL unit coincide; otherwise the data is part of the H2645Packet's rbsp_buffer. (A few trailing zeros in between NAL units can also make ff_h2645_extract_rbsp copy. This happens with some transport streams.)
So my first patch simply tests whether a NAL unit's data and raw_data pointers agree; if so, then the data is part of the fragment's data and therefore we can use the fragment's data_ref. Given that 0x03 escape bytes are rare, this already gives a noticeable speed boost. Notice that the NAL->data pointer is a pointer to const uint8_t, whereas a CBS unit's data is not const-qualified, but this doesn't pose any problems: Even if one wanted to modify the data (if I am not mistaken, then this doesn't happen right now, because all modifications are done on the content), one would have to ensure that the data is writable before one does so. In order to avoid copying the other NAL units, too, I had to write an analogue of av_fast_malloc for buffers: avpriv_buffer_fast_alloc. I chose avpriv, because it is easier to add a function to the public API than to remove it. And of course I also had to modify ff_h2645_packet_split slightly to work with it. Notice that the cbs-filters all uninitialize the fragment when they are done processing a packet, so that the rbsp_buffer will be writable again when decomposing the next so that the number of reallocations for rbsp_buffer is not higher than now. (The only exception to this is H.264/HEVC content in mp4/Matroska where the SPS (or VPS in case of HEVC) in the extradata contain escape 0x03. This often happens with typical PAL framerates if they are written in the VUI.) For lots of content, the gain that this last change yields is negligible, but there is one kind of material that really benefits from it: Content with hardcoded black bars. See the benchmarks. In both situations, there is padding at the end of the new data, but the padding isn't zeroed. I don't see a problem with this and anyway, this is the same as in cbs_mpeg2, where the padding at the end of a unit is actually the beginning of the next unit (except for the last unit of a packet, of course). I have also modified the documentation of ff_h2645_packet_split to document the behaviour that cbs_h2645 now relies upon. Here are benchmarks where the timer includes both the calls to ff_h2645_packet_split as well as cbs_h2645_fragment_add_nals in cbs_h2645_split_fragment. Due to the change in ff_h2645_packet_split this is the only admissible way of comparing when all patches are applied: A 5.1 Mb/s file with 50p, no hardcoded black bars and 8 runs of 262144 runs each; one slice per frame: Current version: 107737 Decicycles First patch applied: 76169 Decicycles All patches applied: 75837 Decicycles A 7.8 Mb/s file with 50p, hardcoded black bars, one slice per frame. 8 runs of 131072 runs each. Current version: 379114 Decicycles First patch applied: 369410 Decicycles All patches applied: 327677 Decicycles If one only measures the call to cbs_h2645_fragment_add_nals, the difference gets bigger, of course. Because of the modifications to ff_h2645_packet_split no benchmarks for the whole patchset are given. First file again: Current version: 36940 Decicycles First patch applied: 6364 Decicycles Second file again: Current version: 60532 Decicycles First patch applied: 48801 Decicycles Andreas Rheinhardt (4): cbs_h2645: Avoid memcpy when splitting fragment avutil/buffer: Add av_fast_malloc equivalent h2645_parse: Make ff_h2645_packet_split reference-compatible cbs_h2645: Avoid memcpy when splitting fragment #2 libavcodec/cbs_h2645.c | 45 +++++++++++++++--------------- libavcodec/cbs_h2645.h | 2 ++ libavcodec/extract_extradata_bsf.c | 4 +-- libavcodec/h2645_parse.c | 28 +++++++++++++++---- libavcodec/h2645_parse.h | 14 ++++++++-- libavcodec/h264_parse.c | 4 +-- libavcodec/h264dec.c | 6 ++-- libavcodec/hevc_parse.c | 5 ++-- libavcodec/hevc_parser.c | 4 +-- libavcodec/hevcdec.c | 4 +-- libavutil/buffer.c | 37 ++++++++++++++++++++++++ libavutil/buffer.h | 19 +++++++++++++ 12 files changed, 128 insertions(+), 44 deletions(-) -- 2.19.1 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel