On 10 October 2014 23:08, Mika Raento <mi...@iki.fi> wrote: > Firstly, thank you for the detailed explanation. > > Secondly, how should we proceed? > > I am not confident I'm able to implement that correctly, especially > with no test coverage. > > My current implementation improves discontinuous fragmented mp4s > significantly (from unusable to close-to-perfect) while slightly > worsening the timestamps for non-discontinuous fragmented mp4s. I > definitely need it for our streams, and I think it would help other > people in the same situation. I am quite willing to spend time on > this, but I fear that I just don't have enough known inputs and > outputs to verify my implementation. > > Normally fragments are supposed to start on key frames, which should > have pts close to dts, but there are no guarantees. > > Some alternatives: > > 1. I can leave my implementation behind a flag. That's not very > friendly to others, but breaks no existing usage. > > 2. We can merge my code as-is, and hope somebody more knowledgeable > can fix it up later. > > 3. I can try to implement the algorithm described.
This is the one I picked. I'm submitting a version that produces identical timestamps as master for Michael's test case and fixes my discontinuous ismvs' timestamps. Mika > > 4. Somebody helps me with either implementation or by providing test cases. > > Opinions? > > Mika > > > On 10 October 2014 20:11, Yusuke Nakamura <muken.the.vfrman...@gmail.com> > wrote: >> 2014-10-10 13:38 GMT+09:00 Mika Raento <mi...@iki.fi>: >> >>> On 9 October 2014 23:37, Yusuke Nakamura <muken.the.vfrman...@gmail.com> >>> wrote: >>> > 2014-10-10 4:49 GMT+09:00 Michael Niedermayer <michae...@gmx.at>: >>> > >>> >> On Thu, Oct 09, 2014 at 09:44:43PM +0200, Michael Niedermayer wrote: >>> >> > On Thu, Oct 09, 2014 at 06:57:59PM +0300, Mika Raento wrote: >>> >> > > If present, an MFRA box and its TFRAs are read for fragment start >>> >> times. >>> >> > > >>> >> > > Without this change, timestamps for discontinuous fragmented mp4 are >>> >> > > wrong, and cause audio/video desync and are not usable for >>> generating >>> >> > > HLS. >>> >> > > --- >>> >> > > libavformat/isom.h | 15 ++++++ >>> >> > > libavformat/mov.c | 140 >>> >> +++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> > > 2 files changed, 155 insertions(+) >>> >> > >>> >> > this seems to break some files >>> >> > >>> >> > for example a file generated with the following 2 commands: >>> >> > ffmpeg -i matrixbench_mpeg2.mpg -t 10 in.mp4 >>> >> > l-smash/cli/remuxer -i in.mp4 --fragment 1 -o test.mp4 >>> >> > >>> >> > ive not investigated why this doesnt work >>> >> >>> >> maybe above was unclear, so to clarify before someone is confused >>> >> test.mp4 from above plays with ffplay before te patch but not really >>> >> aferwards. The 2 commads are just to create such file >>> >> >>> >> [...] >>> >> >>> >> -- >>> >> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB >>> >> >>> >> Good people do not need laws to tell them to act responsibly, while bad >>> >> people will find a way around the laws. -- Plato >>> >> >>> >> _______________________________________________ >>> >> ffmpeg-devel mailing list >>> >> ffmpeg-devel@ffmpeg.org >>> >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>> >> >>> >> >>> > The 'time' field in the tfra box is defined in presentation timeline, not >>> > composition or decode timeline. >>> > Therefore, generally, the value of 'time' can't be used for DTS directly >>> as >>> > long as following 14496-12. >>> > Maybe some derivatives of ISO Base Media file format define differently, >>> > but the spec of ISO Base Media file format defines 'time' as the >>> > presentation time of the sync sample. >>> > Presentation times are composition times after the application of any >>> edit >>> > list for the track. >>> > >>> > I have also some samples which use the 'time' as DTS of sync sample. >>> > Historically, the term 'presentation time' was not defined clearly before >>> > 14496-12:2012, this fact possibly may have brought about such >>> inconsistency. >>> >>> Hm. So my changes aren't correct if there is an edit list? Because >>> AFAICT without edit lists mov.c sets pkt->pts = pkt->dts. >>> >> >> Wrong. PTS == DTS is nothing to do with edit list. Generally, CTS != DTS >> occurs only when frame reordering exists. >> Even if there is no edit list for a track, there is implicit edit for that >> track, and in this case PTS == (CTS + alpha)*mvhd.timescale/mdhd.timescale, >> where the constant alpha depends on the implementation. >> >> >>> >>> Would you mind explaining how edit lists and fragment times are >>> supposed to work together? >>> >>> >> The tfra box is designed as the player seeks and finds sync sample on >> presentation timeline i.e. by PTS in units of mdhd.timescale. >> PTS comes from CTS via edit list, and CTS comes from DTS. So, basically you >> can't get DTS directly from the 'time' field in the tfra box. >> >> Let's say mvhd.timescale=600, mdhd.timescale=24000 and the edit list >> contains two edits (edit[0] and edit[1]), >> edit[0] = {segment_duration=600, media_time=-1, media_rate=1}; // empty >> edit >> edit[1] = {segment_duration=1200, media_time=2002, media_rate=1}; >> and the track fragment run in a track fragment which you get from an entry >> in the tfra box, where 'time' in that entry is equal to 48000 is as follows. >> trun.sample[0].sample_is_non_sync_sample = 1 >> trun.sample[0].sample_duration=1001 >> trun.sample[0].sample_composition_time_offset=1001 >> trun.sample[1].sample_is_non_sync_sample = 0 >> trun.sample[1].sample_duration=1001 >> trun.sample[1].sample_composition_time_offset=1001 >> Then, time/mdhd.timescale*mvhd.timescale=1200, that is, the PTS of the sync >> sample is equal to 1200 in mvhd.timescale. >> And, the first edit is an empty edit, so the presentation of actual media >> starts with 1200 - 600 = 600 in mvhd.timescale. >> The CTS of the second sample in the trun.sample[1] is equal to 1001 + X, >> where the X is the sum of the duration of all preceding samples. >> The presentation starts with CTS=2002 because of media_time of the second >> edit, so the PTS of the sync sample corresponds to X - 1001. >> From this, X is equal to 49001, and the DTS of trun.sample[0] is equal to X >> - trun.sample[0].sample_duration = 49001 - 1001 = 48000. >> >> |<--edit[0]-->|<---------edit[1]--------->| >> |-------------|-------------|-------------|---->presentation timeline >> 0 D T' >> |-------------|------------------>composition timeline >> media_time T >> |-----|-------|-----|------------------>decode timeline >> 0 media_time X T >> |<--->| >> ct_offset >> >> D = edit[0].segment_duration = 600 >> T' = time/mdhd.timescale*mvhd.timescale = 1200 >> media_time = edit[1].media_time = 1001 >> ct_offset = trun.sample[1].sample_composition_time_offset = 1001 >> T=ct_offset + X >> T-media_time = (T'-D)*mdhd.timescale/mvhd.timescale >> >> >>> Mika >>> >>> > _______________________________________________ >>> > ffmpeg-devel mailing list >>> > ffmpeg-devel@ffmpeg.org >>> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>> _______________________________________________ >>> ffmpeg-devel mailing list >>> ffmpeg-devel@ffmpeg.org >>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >>> >> _______________________________________________ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel