Hi,
Just following up on this - I'm sorry I haven't been able to look at the
proposed patchset myself quite in detail yet.
My prime concern is about the requests to have this merged into the
upcoming 6.1 release; that's way too soon IMO.
These patches do change aspects of how these things behave, that have been
working the same for a very long time, so there are all sorts of potential
subtle breakage, or (incorrect or not) assumptions being broken, across
libavcodec and its users.
On Sat, 4 Nov 2023, Derek Buitenhuis wrote:
Next, a quick breakdown of the AAC situation, in terms of both how this it is
stored,
what we support, and the state of the ecosystem and types of files that exist:
* 'Raw' ADTS streams have no way to store any of this. The best we can do is
guess
the pre-roll. We should not guess priming or end padding, as no matter
what we do,
it'll be wrong, and any value will be a cargo culted hack value.
I share this concern; all the various encoders I've seen have used a
different amount of priming samples, so guessing it will be bound to be
wrong in a lot of the cases.
* MP4 - there are two places to store this metadata - one standard, and one
proprietary
Apple way. There are, separately, two ways to signal priming length when
SBR is present.
* MP4s may contain a user data box with 'iTunSMPB' which contains
priming, pre-roll,
and end padding data. We support reading only priming data from this at
the moment,
and we set skip samples based on this. This is 'iTunes style' metadata.
* The standards compliant (read: non-iTunes) way is to use an edit list
to trim the
priming samples, and, opionally end padding. End padding may also be
trimmed by reducing
the sample duration of the last packet in the stts box. Pre-roll is
store in the sgpb
box with the 'roll', type, which signals the roll distance as a number
of packets;
for example, -1 indicates you should decode an discard the samples of 1
packet before
beginning plaback. Notably, this allows the sgpd box to also be use for
video like
periodic intra refresh H.264. libavformat does not current parse or
export this info,
but even if we did, converting number of packets to audio samples can
get hairy.
* Notably, since in MP4, the edit list represents the exact
presentation-level info,
when no edit list, or an edit list startiing at 0 is present, no
samples, not even
pre-roll should be trimmed - all players in the wild handle this
properly, and it
has been standard practice among streaming services for >10 years
to not output
the AAC frames representing priming samples at all (even if there
is a small hit
quality). This is what the patch at [0] is addressing.
FWIW, MP4 isn't the only container where this might be relevant; AAC is
frequently used in muxes together with video in FLV and MKV and others as
well.
In the case of FLV, I'm not aware of any metadata that signals how much to
trim off, so essentially we can't do it by guessing. On the producing
side, this is handled by shifting the timestamps so the audio track, which
would be starting at -<delay>, ends up starting at 0, and the video track
ends up starting at +<audiodelay> instead.
In that case, if we trim off the priming samples (based on a guess as
that's all we have?), I guess that'd lead us to both tracks starting at
+<delay> (i.e. not affecting sync). As long as it doesn't change sync, I
guess it can be tolerable.
To avoid all these effects, producers of muxed files can work around this
in many ways. For many years, I've been doing the trick of skipping the
first <delay> samples of input to the audio encoder, so that after
accounting for that, I have both audio and video tracks starting at 0.0,
without the decoder needing to do anything - working the same across all
players, good and bad.
If we suddenly start decoding such files with the audio track starting at
+<delay>, I guess it'll be ok for sync, but it's a mildly surprising
change, but hopefully any reasonable player based on libavcodec would
still not freak out by it.
* As noted above, I don't think we should apply any guessed priming to
initial samples (pre-roll,
or 'algorithmic delay, included). No other decoders or players do this, in
the world, to my
knowledge, and violating the principal of least surpise because we think
we're slightly more
correct isn't great. I also think trying to 'fix' raw ADTS is destined to
always be a hack,
an we shouldn't. YMMV. I'd like to hear views from others here. This would
make the patch in
[0] redundant.
Yes, with raw ADTS there's really no good way of getting this right, other
than plain guessing, and there's no single universally correct guess
AFAIK.
(And even if we have a qualified guess for the amount of encoder priming,
we have even less knowledge about how much to trim off at the end, if
we're aiming at proper gapless playback.)
For MP4 there's at least a couple ways of signalling it.
But also, given all of this, I think we need to deeply consider how we
approach this, so we don't end up with something that only covers
certain cases (and I am sure I forgot more cases). To that end, I do not
think rushing to get a patchset that can change sync on all AAC files in
existence into 6.1 is wise. Even when this does go in, it should be able
to sit in master for a good long time before being in a release.
+1. This has the potential to be surprising in many different cases, and
may need a bunch of follow up patches to sort out cases found later. It
definitely should sit in git master for a some time before ending up in a
release - not be slipped into 6.1 the week before the release.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".