On Sun, Nov 02, 2014 at 06:16:58PM -0800, Alex Sukhanov wrote:
> *Hi ffmpeg-devel, I’m sending this mail in order to encourage some
> discussion about ISO BMFF specification and find out what people think
> about problems described below and if anybody else also have seen these
> problems.I’m software engineer and I write MP4 Muxer/Demuxer (It’s not
> FFmpeg code) which is then used to generate DASH streams (MP4 fragmented
> container). Some players reported playback issues so I had to provide some
> controversial patches in MP4 Muxer to make these players accepting streams
> generated by my Muxer. My current understanding is that these issues happen
> because people read ISO BMFF specification differently, so they expect
> different behavior.Below I would like to describe these issues and I would
> be glad to hear your opinion about that. The latest spec I have on hands is
> ISO/IEC 14496-12:2012. Fourth edition 2012-07-15, Corrected version
> 2012-09-15. TFDT::BaseMediaDecodeTimeISO BMFF standard defines
> TFDT::BaseMediaDecodeTime as is an integer equal to the sum of the decode
> durations of all earlier samples in the media:8.8.12 Track fragment decode
> timeThe Track Fragment Base Media Decode Time Box provides the absolute
> decode time, measured on the media timeline, of the first sample in decode
> order in the track fragment. This can be useful, for example, when
> performing random access in a file; it is not necessary to sum the sample
> durations of all preceding samples in previous fragments to find this value
> (where the sample durations are the deltas in the Decoding Time to Sample
> Box and the sample_durations in the preceding track
> runs)....baseMediaDecodeTime is an integer equal to the sum of the decode
> durations of all earlier samples in the media, expressed in the media's
> timescale. It does not include the samples added in the enclosing track
> fragment.Player claims that TFDT::BaseMediaDecodeTime must be strictly
> equal to sum of all preceding sample decode durations. Most likely, Player
> side is absolutely right here, because specification just says
> “baseMediaDecodeTime is an integer equal to the sum of the decode durations
> of all earlier samples in the media” Unfortunately, strict following of the
> spec makes Transcoding/Muxing process much more complicated and flaky.
> Modern transcoding engines (for instance, YouTube and Vimeo) split video
> and chunks and transcode them in parallel. Parallel transcoding and frame
> rate conversion cause DTS/PTS fluctuation, so it becomes not trivial to
> follow TFDT::BaseMediaDecodTime and in most cases sample duration
> correction or sample dropping is required to follow this rule strictly. The
> most complicated thing is that in order to perform this correction for
> current fragment (MOOF+MDAT pair), we have to know DTS of first sample of
> the next fragment. We can not really get this information because of
> parallel processing, so we try to guess this value. We can do it for
> constant framerate, but for variable frame rate it’s a real issue. Frankly,
> current solution is flaky.Current TFDT::BaseMediaDecodeTime requirement
> also doesn’t address frame dropping and stream errors which may happen
> during live streaming/transcoding.What do you think about

> TFDT::BaseMediaDecodeTime?TRUN::SampleDurationMediaSourceExtensions
> <http://www.w3.org/TR/media-source/> (MSE) specification authors think that
> TRUN::duraton in ISO BMFF spec is "sample duration". In another words
> TRUN::duration[n] = PTS[n+1] - PTS[n]. I always have been thinking that
> TRUN::duration is calculated as DTS[n+1] - DTS[n].In most cases delta DTS
> is equal to delta PTS, but because of timescale conversion rounding,
> DTS/PTS fluctuations caused by parallel processing and framerate
> conversion, it’s not true all the time. This mismatch causes holes on MSE
> playback timeline. Holes cause poor user experience.I went through ISO spec
> and I've seen that it is not clear. It explicitly says that STTS entries
> are decoding deltas: Time to Sample BoxesThe composition times (CT)
> and decoding times (DT) of samples are derived from the Time to Sample
> Boxes, of which there are two types. The decoding time is defined in the
> Decoding Time to Sample Box, giving time deltas between successive decoding
> times. Decoding Time to Sample BoxThe Decoding Time to Sample Box
> contains decode time delta's: DT(n+1) = DT(n) + STTS(n) where STTS(n) is
> the (uncompressed) table entry for sample n.As you can see, ISO spec is
> very clear about regular MP4 files. Unfortunately, it's not so clear about
> fragmented MP4 and atom TRUN:8.8.8 Track Fragment Run Box...The following
> flags are defined:...0x000100 sample-duration-present: indicates that each
> sample has its own duration, otherwise thedefault is used....I reviewed
> FFmpeg MOV Muxer code and it calculates TRUN::SampleDuration as DTS
> delta:http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/movenc.c;h=a43752a01173eb8a37fb459f8325d516daf2e74a;hb=HEAD#l860
> <http://git.videolan.org/?p=ffmpeg.git;a=blob;f=libavformat/movenc.c;h=a43752a01173eb8a37fb459f8325d516daf2e74a;hb=HEAD#l860>What
> do you think about TRUN::SampleDuration?*

The field is unsigned in the specification, pts differences can be
negative (in case of B frames for example), thus i do not see how one
could interpret it as PTS[n+1] - PTS[n]
but maybe iam missing something

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The bravest are surely those who have the clearest vision
of what is before them, glory and danger alike, and yet
notwithstanding go out to meet it. -- Thucydides

Attachment: signature.asc
Description: Digital signature

ffmpeg-devel mailing list

Reply via email to