Michael Niedermayer: > On Sun, Mar 10, 2019 at 11:03:00PM +0000, Andreas Rheinhardt wrote: >> Michael Niedermayer: >>> On Fri, Mar 08, 2019 at 10:25:59AM +0100, Andreas Rheinhardt wrote: >>>> When the new incremental parser was introduced, the old parser was >>>> kept, because the new parser was unable to handle the way SSA packets >>>> are put into Matroska. But since 2014 (since >>>> c7d8dbad14ed5fa3c217a4fc1790021d6c0b6416) this is no longer needed, so >>>> that the old parser can be completely removed. >>>> >>>> Signed-off-by: Andreas Rheinhardt <andreas.rheinha...@googlemail.com> >>>> --- >>>> libavformat/matroskadec.c | 72 ++++++--------------------------------- >>>> 1 file changed, 10 insertions(+), 62 deletions(-) >>> >>> This affects seeking: >>> >>> libavformat/tests/seek >>> issues/388/Matrix.Reloaded.Trailer-640x346-XviD-1.0beta2-HE_AAC_subtitled.mkv >>> -duration 400 >oldseek >>> >>> The file appears to be available here: >>> https://samples.ffmpeg.org/Matroska/matrix/Matrix.Reloaded.Trailer-640x346-XviD-1.0beta2-HE_AAC_subtitled.mkv >>> >>> --- oldseek 2019-03-08 23:08:21.380042329 +0100 >>> +++ newseek 2019-03-08 23:08:02.048041745 +0100 >>> @@ -8,7 +8,7 @@ >>> ret: 0 st:13 flags:1 dts: 86.750000 pts: 86.750000 pos: -1 >>> size: 50436 >>> ret:-1 st: 1 flags:0 ts: 250.577000 >>> ret: 0 st: 1 flags:1 ts: 13.471000 >>> -ret: 0 st:13 flags:1 dts: 1.776000 pts: 1.776000 pos: -1 size: >>> 50436 >>> +ret: 0 st:13 flags:1 dts: 0.000000 pts: 0.000000 pos: -1 size: >>> 50436 >>> ret:-1 st: 2 flags:0 ts: 176.365000 >>> ret: 0 st: 2 flags:1 ts: 339.259000 >>> ret: 0 st:13 flags:1 dts: 145.080000 pts: 145.080000 pos: -1 >>> size: 50436 >>> >> This is not unexpected. The reason is that the old parser always >> parses a whole cluster at once while the new parser parses clusters >> incrementally, i.e. one block (I use block as a general term for >> SimpleBlock or BlockGroup) at a time. The goal of incremental parsing >> was reduced latency (and with my patch #6 one also achieves a >> reduction of memory used and an increase in performance as well). >> >> With the old parser, the very first cluster gets parsed during >> avformat_find_stream_info() and index entries were created for all the >> keyframes contained therein (the cues only contain entries for the >> main video track, so this only affects the other tracks). The 1.776 >> pts corresponds to the block with the highest timestamp for track #1 >> (or track #2 in Matroska's counting) in the first cluster (with a >> timestamp of 1776ms). >> >> With the incremental parser, only one block of the audio track gets >> parsed during avformat_find_stream_info() (it has a timestamp of 0) >> and so only one index entry gets created for it. >> >> So when the seek based upon the audio track is performed, the used >> index point has a timestamp of either 0ms or 1776ms. Then the >> current_dts is updated based upon this timestamp. >> >> Now this file has an attached picture which gets translated into its >> own track and upon every seek this picture will be put into the >> raw_packet_buffer from which it will be read by ff_read_packet as the >> first packet after each seek. Given that this packet doesn't have >> timestamps, the timestamp will be set equal to current_dts (in >> compute_pkt_fields()). Hence the discrepancy. >> >> Btw: Because of things like this, a fate test had to be changed during >> the initial introduction of incremental parsing (in >> 8336eb6f85e4b94b9c198b16bd0ac4178f4dba86). > > Sounds like undesirable behavior if not a bug > > The seek request is to 13.471000 and the result should not depend on > what has been parsed or not prior to the seek request. > also if a packet can be produced for 1.776000 which adequatly fullfills > the seek reuest that is better than a more distant one as that would > increase latency experienced by the user > If all one cares about is that the returned packet is near to the desired timestamp, then matroska_read_seek succeeds in two cases: a) The currently available index entries for the desired track are so fine-grained that one can use this index to seek accurately. This includes the standard case that one seeks according to a video track for which the file provides cues (for the keyframes). b) The desired timestamp is so big that it is beyond the last index entry or the returned index entry is the last index entry. In this case the file is read from the last index entry onward, so that a very precise timestamp can be found.
If one only seeks wrt the same track and if said track either has a good index or no index, then this works well: It's clear that it works in case there is a good index and if there is no index at all (because in this case the index that is being built up in the background is good for a whole initial portion of the file and said portion won't have "holes"). But if one changes the track wrt to one seeks, problems can arise. Think of the case where one first seeks via the cues according to the video track to positions t_0 and t_1 and reads a few blocks at both times. This includes the creation of index entries for other tracks (at least for continuous tracks like audio tracks). If one now seeks wrt to such an audio track to time t' with t_0<t'<t_1, one might end up in a place pretty far away from t'. This happens for both parsing methods. There is a case where the incremental parser fares better: If the clusters referenced in the cues all begin with video keyframes (standard mkvmerge behaviour, so pretty normal) and one only wants to read a single frame from t_0 and t_1, then these frames will be video keyframes and therefore no index entries for the other tracks will have been created, so that if one seeks with respect to a different track than the video track, the whole file will be read until one reaches the desired timestamp. I therefore view the fact that matroska_read_seek's result depends upon what has already been parsed as a general bug in matroska_read_seek, not as a drawback of incremental parsing compared to classical parsing. Here are the results for what I have just described. Old parsing: ret: 0 st: 1 flags:1 dts: 0.000000 pts: 0.000000 pos: 6653 size: 1792 ret: 0 st:-1 flags:0 ts:-1.000000 ret: 0 st: 0 flags:1 dts: NOPTS pts: 0.016000 pos: 20998 size:134602 ret: 0 st:-1 flags:1 ts: 161.894167 ret: 0 st: 0 flags:1 dts: NOPTS pts: 161.696000 pos:212409707 size:128647 ret: 0 st: 0 flags:0 ts: 324.788000 ret: 0 st: 0 flags:1 dts: NOPTS pts: 324.936000 pos:356642724 size:195297 ret: 0 st: 0 flags:1 ts: 87.683000 ret: 0 st: 0 flags:1 dts: NOPTS pts: 87.456000 pos:122474581 size:251000 ret: 0 st: 1 flags:0 ts: 250.577000 ret: 0 st: 1 flags:1 dts: 324.960000 pts: 324.960000 pos:356873874 size: 1792 ret: 0 st: 1 flags:1 ts: 13.471000 ret: 0 st: 1 flags:1 dts: 0.512000 pts: 0.512000 pos: 450228 size: 1792 Incremental parsing: ret: 0 st: 1 flags:1 dts: 0.000000 pts: 0.000000 pos: 6653 size: 1792 ret: 0 st:-1 flags:0 ts:-1.000000 ret: 0 st: 0 flags:1 dts: NOPTS pts: 0.016000 pos: 20998 size:134602 ret: 0 st:-1 flags:1 ts: 161.894167 ret: 0 st: 0 flags:1 dts: NOPTS pts: 161.696000 pos:212409707 size:128647 ret: 0 st: 0 flags:0 ts: 324.788000 ret: 0 st: 0 flags:1 dts: NOPTS pts: 324.936000 pos:356642724 size:195297 ret: 0 st: 0 flags:1 ts: 87.683000 ret: 0 st: 0 flags:1 dts: NOPTS pts: 87.456000 pos:122474581 size:251000 ret: 0 st: 1 flags:0 ts: 250.577000 ret: 0 st: 1 flags:1 dts: 250.592000 pts: 250.592000 pos:296449118 size: 1792 ret: 0 st: 1 flags:1 ts: 13.471000 ret: 0 st: 2 flags:1 dts: 13.216000 pts: 13.216000 pos:18014477 size: 20 (I see two main avenues to improve matroska_read_seek: 1. Use the index of other tracks if the best index entry for the desired track is too far away. (Problems: What about files with wrong interleavement? What is too far away? If the targeted track has gaps in it, then one might end up parsing a lot unnecessarily.) 2. If the found index is too far away, one could simply parse the file a bit more. 2. would have the disadvantage that even video tracks with proper cues (i.e. an entry for every keyframe), but long GOPs would be parsed. This could be mitigated by restricting it to tracks for which no cue entries were present (i.e. when cue entries are present for a track, then it is presumed that there is a cue entry for every keyframe of said track) and which can be presumed to have no holes in it (i.e. no subtitle tracks).) - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel