On Sun, Nov 02, 2014 at 06:16:58PM -0800, Alex Sukhanov wrote: > *Hi ffmpeg-devel, I’m sending this mail in order to encourage some > discussion about ISO BMFF specification and find out what people think > about problems described below and if anybody else also have seen these > problems.I’m software engineer and I write MP4 Muxer/Demuxer (It’s not > FFmpeg code) which is then used to generate DASH streams (MP4 fragmented > container). Some players reported playback issues so I had to provide some > controversial patches in MP4 Muxer to make these players accepting streams > generated by my Muxer. My current understanding is that these issues happen > because people read ISO BMFF specification differently, so they expect > different behavior.Below I would like to describe these issues and I would > be glad to hear your opinion about that. The latest spec I have on hands is > ISO/IEC 14496-12:2012. Fourth edition 2012-07-15, Corrected version > 2012-09-15. TFDT::BaseMediaDecodeTimeISO BMFF standard defines > TFDT::BaseMediaDecodeTime as is an integer equal to the sum of the decode > durations of all earlier samples in the media:8.8.12 Track fragment decode > timeThe Track Fragment Base Media Decode Time Box provides the absolute > decode time, measured on the media timeline, of the first sample in decode > order in the track fragment. This can be useful, for example, when > performing random access in a file; it is not necessary to sum the sample > durations of all preceding samples in previous fragments to find this value > (where the sample durations are the deltas in the Decoding Time to Sample > Box and the sample_durations in the preceding track > runs)....baseMediaDecodeTime is an integer equal to the sum of the decode > durations of all earlier samples in the media, expressed in the media's > timescale. It does not include the samples added in the enclosing track > fragment.Player claims that TFDT::BaseMediaDecodeTime must be strictly > equal to sum of all preceding sample decode durations. Most likely, Player > side is absolutely right here, because specification just says > “baseMediaDecodeTime is an integer equal to the sum of the decode durations > of all earlier samples in the media” Unfortunately, strict following of the > spec makes Transcoding/Muxing process much more complicated and flaky. > Modern transcoding engines (for instance, YouTube and Vimeo) split video > and chunks and transcode them in parallel. Parallel transcoding and frame > rate conversion cause DTS/PTS fluctuation, so it becomes not trivial to > follow TFDT::BaseMediaDecodTime and in most cases sample duration > correction or sample dropping is required to follow this rule strictly. The > most complicated thing is that in order to perform this correction for > current fragment (MOOF+MDAT pair), we have to know DTS of first sample of > the next fragment. We can not really get this information because of > parallel processing, so we try to guess this value. We can do it for > constant framerate, but for variable frame rate it’s a real issue. Frankly, > current solution is flaky.Current TFDT::BaseMediaDecodeTime requirement > also doesn’t address frame dropping and stream errors which may happen > during live streaming/transcoding.What do you think about
> TFDT::BaseMediaDecodeTime?TRUN::SampleDurationMediaSourceExtensions > <http://www.w3.org/TR/media-source/> (MSE) specification authors think that > TRUN::duraton in ISO BMFF spec is "sample duration". In another words > TRUN::duration[n] = PTS[n+1] - PTS[n]. I always have been thinking that > TRUN::duration is calculated as DTS[n+1] - DTS[n].In most cases delta DTS > is equal to delta PTS, but because of timescale conversion rounding, > DTS/PTS fluctuations caused by parallel processing and framerate > conversion, it’s not true all the time. This mismatch causes holes on MSE > playback timeline. Holes cause poor user experience.I went through ISO spec > and I've seen that it is not clear. It explicitly says that STTS entries > are decoding deltas:8.6.1.1 Time to Sample BoxesThe composition times (CT) > and decoding times (DT) of samples are derived from the Time to Sample > Boxes, of which there are two types. The decoding time is defined in the > Decoding Time to Sample Box, giving time deltas between successive decoding > times.8.6.1.2 Decoding Time to Sample BoxThe Decoding Time to Sample Box > contains decode time delta's: DT(n+1) = DT(n) + STTS(n) where STTS(n) is > the (uncompressed) table entry for sample n.As you can see, ISO spec is > very clear about regular MP4 files. Unfortunately, it's not so clear about > fragmented MP4 and atom TRUN:8.8.8 Track Fragment Run Box...The following > flags are defined:...0x000100 sample-duration-present: indicates that each > sample has its own duration, otherwise thedefault is used....I reviewed > FFmpeg MOV Muxer code and it calculates TRUN::SampleDuration as DTS > delta:http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/movenc.c;h=a43752a01173eb8a37fb459f8325d516daf2e74a;hb=HEAD#l860 > <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/movenc.c;h=a43752a01173eb8a37fb459f8325d516daf2e74a;hb=HEAD#l860>What > do you think about TRUN::SampleDuration?* The field is unsigned in the specification, pts differences can be negative (in case of B frames for example), thus i do not see how one could interpret it as PTS[n+1] - PTS[n] but maybe iam missing something [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The bravest are surely those who have the clearest vision of what is before them, glory and danger alike, and yet notwithstanding go out to meet it. -- Thucydides
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel