Hi,

I'm going to opine a bit here, and also comment on the mov/MP4 patch[0] that 
accompanies
this set.

This is for both historical purposes, and to distill IRC logs into something 
more
digestible for others on the list to gain context on the issue, so apologies for
re-treading ground.

On 10/30/2023 5:09 AM, Lynne wrote:
> This is a convenience function, which is required to be called by decoders
> needing to skip samples every time.
> It automatically creates and increments side data.
> 
> The idea is to get rid of skip_samples eventually and replace it with this
> function.

So there is a lot to cover here, and  lot of nuance to how things should be 
handled,
that I want to list out, before discussing the specific changes of these two 
sets of
patches. A bit of of a 'state of the world'.

The goal of this patchset seems to be to aid in gapless playback and correct 
seeking
in AAC streams (or later, more generally MDCT-styl audio codecs in general), 
but properly
skipping initial priming samples (which include pre-roll), pre-roll (both 
normal, and extra
incurred due to SBR) when seeking, and and, though not covered in these sets, 
I'll mention,
end padding.

First a note on terminology: 'Algorithmic delay' as it is being used here is 
not quite
correct, for two reasons:
    * Latency and pre-roll (or roll distance) are separate things. Opus, for 
example,
      can have a latency as low as 2.5ms, but pre-roll is always at least 80ms 
- they
      are different things which serve different purposes, and I confirmed this 
with
      people who definitely know more about audio than me[1]. Pre-roll is often 
larger
      than latency, and the values stored in file metadata reflect this.
    * Pre-roll, or roll distance, are the industry standard terms. Making up 
out own
      terms because we disagree is silly and stubborn, and makes it harder on 
API
      users trying to use the API correctly, or understnd our code.

Next, a quick breakdown of the AAC situation, in terms of both how this it is 
stored,
what we support, and the state of the ecosystem and types of files that exist:
    * 'Raw' ADTS streams have no way to store any of this. The best we can do 
is guess
      the pre-roll. We should not guess priming or end padding, as no matter 
what we do,
      it'll be wrong, and any value will be a cargo culted hack value.
    * MP4 - there are two places to store this metadata - one standard, and one 
proprietary
      Apple way. There are, separately, two ways to signal priming length when 
SBR is present.
       * MP4s may contain a user data box with 'iTunSMPB' which contains 
priming, pre-roll,
         and end padding data. We support reading only priming data from this 
at the moment,
         and we set skip samples based on this. This is 'iTunes style' metadata.
       * The standards compliant (read: non-iTunes) way is to use an edit list 
to trim the
         priming samples, and, opionally end padding. End padding may also be 
trimmed by reducing
         the sample duration of the last packet in the stts box. Pre-roll is 
store in the sgpb
         box with the 'roll', type, which signals the roll distance as a number 
of packets;
         for example, -1 indicates you should decode an discard the samples of 
1 packet before
         beginning plaback. Notably, this allows the sgpd box to also be use 
for video like
         periodic intra refresh H.264. libavformat does not current parse or 
export this info,
         but even if we did, converting number of packets to audio samples can 
get hairy.
           * Notably, since in MP4, the edit list represents the exact 
presentation-level info,
             when no edit list, or an edit list startiing at 0 is present, no 
samples, not even
             pre-roll should be trimmed - all players in the wild handle this 
properly, and it
             has been standard practice among streaming services for >10 years 
to not output
             the AAC frames representing priming samples at all (even if there 
is a small hit
             quality). This is what the patch at [0] is addressing.
               * My personal opinion is that since priming samples include any 
inherent delay already,
                 that if we do not know how many priming samples there are, we 
should not trim anything
                 from the start of the file, regardless of format. I am keen on 
hearing others Opinions(TM)
                 here, particularily Anton and Martin (sorry for name dropping 
:)).
       * Further complicating matters is the fact that, again thanks to Apple, 
there are a lot
         of broken files around, since iTunes expects files to *not* include 
addition delay incurred
         by SBR in their edit list / priming info, even though, by spec, you 
are suppose to. This
         leads to the unfortunate case where you have tons of files in the wild 
that both do, and
         do not include SBR delay in their edit lists, and there is no way of 
detecting when this is
         the case. I do not have a good solution to this other than proposing a 
flag somewhere to
         switch between behaviors.

Aside, for Opus, we incorrectly read this info from the codec specific box 
instead of the sgpd box...
but we only ever put it in a write-only field.

Now, on to the patches / implementation (opinions warning):

    * As noted above, I don't think we should apply any guessed priming to 
initial samples (pre-roll,
      or 'algorithmic delay, included). No other decoders or players do this, 
in the world, to my
      knowledge, and violating the principal of least surpise because we think 
we're slightly more
      correct isn't great. I also think trying to 'fix' raw ADTS is destined to 
always be a hack,
      an we shouldn't. YMMV. I'd like to hear views from others here. This 
would make the patch in
      [0] redundant.
    * I don't think hardcoding a value of 1024 (or 960) is wise, or necessarily 
even correct. For example,
      all HE-AAC MP4s I have signal a seek pre-roll of 2048 samples. Ideally we 
would pass through seek
      pre-roll somehow - but I am not sure of a good way in our existing setup 
- the write-only codecpar
      field for seek pre-roll is in samples, which is kind of incompatible with 
the way the sgpd box stores
      pre-roll as number of packets. We could set it based one the duration of 
the first packet(s), assuming
      all audio codecs will have only 1 packet duration (frame size), or we 
could add a new field. Possibly,
      we could leave the hardcoded values in place, only for seeking, inside 
e.g. ADTS.
    * This kind of brings me to: I don't really think using the same side data 
for both priming and pre-roll
      is a great idea. Clearly this set of problems is already confusing enough 
to many, and conflating the
      two in the API is a sure way to only make it worse.
    * If we are going to discard at seek, we need to take this into account in 
the demuxer's seek code, or
      we will always be $preroll number of samples off where the user actually 
wanted to seek and decode.

I am almost certain I missed even more nuance, and hopefully Martin or Anton 
can chime in, or I remember more.

But also, given all of this, I think we need to deeply consider how we approach 
this, so we don't end up with
something that only covers certain cases (and I am sure I forgot more cases). 
To that end, I do not think rushing
to get a patchset that can change sync on all AAC files in existence into 6.1 
is wise. Even when this does go in,
it should be able to sit in master for a good long time before being in a 
release. As I understand it, FATE is
already unhappy, and it shouldn't be treated as being a problem with FATE vs 
the set.

Lastly, some time this weekend/week, I will labour to create a more extensive 
set of AAC files we can use to
test, and use in FATE.

Hope that all made sense and I didn't forget any details in my Covid-induced 
haze.

Cheers,
- Derek

[0] http://ffmpeg.org/pipermail/ffmpeg-devel/2023-October/316389.html
[1] https://gist.github.com/dwbuiten/18745d6cb253a2304f776581c9f68b30
[2] https://github.com/nu774/fdkaac/blob/master/README#L165-L177
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to