On Sun, Jun 02, 2019 at 12:47:18AM +0200, Andreas Rheinhardt wrote: > Until now the function ff_startcode_find_candidate_c did not really > search for startcodes (the startcode 0x00 0x00 0x01 (used in > MPEG-1/2/4, VC-1 and H.264/5) is the only startcode meant here). Instead > it searched for zero bytes and returned the earliest position of a zero > byte. This of course led to lots of false positives - millions per GB > of video. > This has been changed: The first position of the buffer that > may be part of a four-byte startcode is now returned. This excludes zero > bytes that are known not to belong to a startcode, but zero bytes at the > end of a buffer that might be part of a startcode whose second part is in > the next buffer are also returned. This is in accordance with the > expectations of the current callers of ff_startcode_find_candidate_c, > namely the H.264 parser and the VC-1 parser. This is also the reason why > "find_candidate" in the name of the function has been kept. > > Getting rid of lots of function calls with their accompanying overhead > of course brings a speedup with it: Here are some benchmarks for > a 30.2 Mb/s transport stream A (10 iterations of 8192 runs each) and > another 7.4 Mb/s transport stream B (10 iterations of 131072 runs each) > on an x64 Haswell: > > | vanilla | aligned | current: aligned, > | (unaligned) | reads | no false positives > --|-------------|---------|------------------- > A | 411578 | 424503 | 323355 > B | 55476 | 58326 | 44147 > > vanilla refers to the state before the switch to aligned reads; > "aligned reads" refers to the state after the switch to aligned reads. > > (Calls to h264_find_frame_end have been timed; given that the amount > of calls to ff_startcode_find_candidate_c has been reduced considerably, > timing them directly would be worthless.) > > Signed-off-by: Andreas Rheinhardt <andreas.rheinha...@gmail.com> > --- > libavcodec/h264dsp.h | 7 +-- > libavcodec/startcode.c | 98 +++++++++++++++++++++++++++++++++++++----- > libavcodec/vc1dsp.h | 6 +-- > 3 files changed, 91 insertions(+), 20 deletions(-) > > diff --git a/libavcodec/h264dsp.h b/libavcodec/h264dsp.h > index cbea3173c6..51de7a4e21 100644 > --- a/libavcodec/h264dsp.h > +++ b/libavcodec/h264dsp.h > @@ -108,11 +108,8 @@ typedef struct H264DSPContext { > void (*h264_add_pixels4_clear)(uint8_t *dst, int16_t *block, int stride); > > /** > - * Search buf from the start for up to size bytes. Return the index > - * of a zero byte, or >= size if not found. Ideally, use lookahead > - * to filter out any zero bytes that are known to not be followed by > - * one or more further zero bytes and a one byte. Better still, filter > - * out any bytes that form the trailing_zero_8bits syntax element too. > + * Search buf from the start for up to size bytes. Return the first 0x00 > + * that might be part of a (three or four) byte startcode. > */ > int (*startcode_find_candidate)(const uint8_t *buf, int size); > } H264DSPContext; > diff --git a/libavcodec/startcode.c b/libavcodec/startcode.c > index b027c191c0..f6105289f1 100644 > --- a/libavcodec/startcode.c > +++ b/libavcodec/startcode.c > @@ -22,7 +22,8 @@ > * @file > * Accelerated start code search function for start codes common to > * MPEG-1/2/4 video, VC-1, H.264/5 > - * @author Michael Niedermayer <michae...@gmx.at> > + * @author Michael Niedermayer <michae...@gmx.at>, > + * @author Andreas Rheinhardt <andreas.rheinha...@gmail.com> > */ > > #include "startcode.h" > @@ -31,20 +32,60 @@ > > int ff_startcode_find_candidate_c(const uint8_t *buf, int size) > { > - const uint8_t *start = buf, *end = buf + size; > + const uint8_t *start = buf, *end = buf + size, *temp; > > #define INITIALIZATION(mod) do { \ > - for (; buf < end && (uintptr_t)buf % mod; buf++) \ > - if (!*buf) \ > - return buf - start; \ > + if (end - start <= mod + 1) \ > + goto near_end; \ > + /* From this point on until the end of the MAIN_LOOP, \ > + * buf is the earliest possible position of a 0x00 \ > + * immediately preceding a startcode's 0x01, i.e. \ > + * everything from start to buf (inclusive) is known \ > + * to not contain a startcode's 0x01. */ \ > + buf += 1; \
> + temp = (const uint8_t *)((uintptr_t)(buf + mod - 1) / mod * mod); \ smells like FFALIGN() [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Let us carefully observe those good qualities wherein our enemies excel us and endeavor to excel them, by avoiding what is faulty, and imitating what is excellent in them. -- Plutarch
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".