Bug#763876: opus-tools: No way to set application

Ron Fri, 03 Oct 2014 17:57:40 -0700

On Fri, Oct 03, 2014 at 10:06:02PM +0200, Jonas Smedegaard wrote:
> Hi Ron,
> 
> Thanks for clarifying...
> 
> Quoting Ron (2014-10-03 18:28:34)
> > On Fri, Oct 03, 2014 at 12:48:18PM +0200, Jonas Smedegaard wrote:
> >> libvorbis has a tuning option called "application", with values 
> >> "voip", "audio" and "low-delay".
> >
> > Do you actually have some use case where it's really important to 
> > fiddle with that manually?
> >
> > These were manual overrides that were useful for testing in the early 
> > life of the codec, but libopus 1.1 actually has a neural network that 
> > analyses the signal in realtime to dynamically make the best selection 
> > of tuning parameters.  So most people should never need to specify 
> > those, and are likely to get better results if they don't, since most 
> > real world audio doesn't fall cleanly into categories like that as far 
> > as the codec is concerned, even when a user might think the right 
> > choice is "obvious" for what they are encoding.
> 
> Seems it is a documented feature of libopus:
> http://opus-codec.org/docs/html_api-1.0.1/group__opus__encoder.html
> 
> If discouraged and/or obsolete and/or even broken, I guess that should 
> be documented (or at the very least silently removed, but I don't see 
> why not mention such change).


Yes, it's still part of the library API, and it's not so much that it's
"broken" per-se as that it very rarely makes sense to override what
opusenc will already do by default unless you're really doing something
very special, and know all the consequences and constraints that come
with that.

The "voip" option still introduces a high pass filter and changes some
thresholds, but those are mostly only useful if you're really doing
VoIP and don't want full band audio, and opusenc isn't really a tool
for VoIP, since it encodes in Ogg not RTP.  For "spoken word recordings"
it's less clear if that's really an advantage over the music/speech
detection in 1.1 (since things like breath noise are usually already
absent from the recording).

The "low-delay" option can be useful, but again only for real-time
streaming and if you are using very small frames and a bitrate where
CELT will be better than SILK (since it will disable the use of SILK,
and small frames will also reduce the quality that is obtained at a
given bitrate).  But opusenc will already automatically select this
for you if you specify a frame size < 10ms (which already precludes
using SILK), and if you aren't using frames that small, it's almost
certainly not what you want anyway.

A special purpose application using the library directly might have
considerations of its own about when to set these, but for users of
opusenc, the best choice is essentially already a function of the
other options which aren't "hidden".


> > But that said ...
> >
> >> opusenc lack ability to apply this tuning.
> >
> > I don't believe this is strictly true.  You should still be able to 
> > override that using the --set-ctl-int option (along with lots of other 
> > arcane options, that really require you to be quite familiar with the 
> > codec internals to use them in a way that does more good than harm to 
> > the quality of encoding).
> >
> > Not exposing this control more directly was a deliberate choice for 
> > the reasons above aiui.
> 
> Ahh - I did see that mysterious option in the man page.
> 
> Are you saying it is deliberately kept mysterious?

Yes.  Well not exactly "mysterious" since if you're familiar with the
libopus CTLs it is just a direct interface to let you set any of those
to whatever value you please - and there are more of them than just
this one which people doing special things might want to be able to
control - but I believe that Greg is a little gun-shy of exposing
options that innocent people will mostly only hurt themselves with.

Partly through prior experience with things exposing choices that
were easier for casual users to get wrong than right, and partly
because there was a fair bit of early confusion about when to use
which of these choices, with a lot of people guessing wrong (like
thinking "low delay must always be good" and "voip is any speech"
when in reality what they mostly really do is trade away audio
quality for other more specialised considerations).

The emphasis was quite deliberately on opusenc should default to
creating files of the highest quality for the sort of uses that
opusenc is most appropriate for, while still letting expert users
do "expert things" if and when they need to.


> >> The tuning option is accessible e.g. from libav.
> >
> > It's quite possible that libav should also no longer encourage people 
> > to tweak at this too, though I'm not personally familiar with where 
> > and how they allow this.
> 
> I guess either clearly documenting the feature or clearly stating that 
> it is discouraged and obsolete and irrelevant would help not only 
> end-users like me but also library users like libav.

It's not really deprecated as such if you're using the library,
and know full well why you're using them (and it kind of can't be
without breaking the API since you need to pass this when creating
an encoder).  I guess it's more like the gcc optimiser, where for
almost everyone -O2 is what you want to use, and only some tiny
portion of people will really have the need, and do the detailed
testing, to specify specific optimisation options more directly.


> Here's avconv documentation:
> 
> $ avconv -h full | grep libopus -A 10
> avconv version 11-6:11-1, Copyright (c) 2000-2014 the Libav developers
>   built on Sep 13 2014 19:43:14 with gcc 4.9.1 (Debian 4.9.1-13)
> libopus AVOptions:
> -application       <int>   E..A... Intended application type
>    voip                    E..A... Favor improved speech intelligibility
>    audio                   E..A... Favor faithfulness to the input
>    lowdelay                E..A... Restrict to only the lowest delay modes
> -frame_duration    <float> E..A... Duration of a frame in milliseconds
> -packet_loss       <int>   E..A... Expected packet loss percentage
> -vbr               <int>   E..A... Variable bit rate mode
>    off                     E..A... Use constant bit rate
>    on                      E..A... Use variable bit rate
>    constrained             E..A... Use constrained VBR

Yeah the descriptions of those options are the sort of
oversimplification we were trying to avoid (and even the
description in the library docs could probably be better now).

For some cases "audio" mode could actually give "improved speech
intelligibility", and none of the modes are "faithful to the input",
it's a lossy codec so by definition it's unfaithful, it just tries
to be unfaithful in ways you can't hear (in all of the modes), but
it always tries to be as unfaithful as it can get away with because
that's how you save bits and get compression.

And low-delay doesn't actually restrict you to the lowest delay
modes at all.  It just removes the extra codec lookahead delay
that is required when using SILK or hybrid modes, you can still
select frame sizes giving the largest possible latency.

Or put another way, just about any one line description of these
options is going to be completely misleading to anyone who doesn't
dig a whole lot deeper than that.  So having to do that to figure
out how to set them (or know that they exist) isn't an entirely
terrible state of affairs.  It's not bad that they do exist, but
exposing them gives people the impression they ought to use them
or need to pick one, which in general, they shouldn't and don't
at least in the normal case for opusenc.

Apparently one mitigating factor for libav here, is that it can
also be used to stream RTP, so the alternative options might
actually be more relevant to it than they are to opusenc.


> > If you have some really compelling need for this we can run that past 
> > Greg, but I suspect he'll probably say "use the ctl, or most 
> > preferably don't!" unless it's something we've really never thought 
> > about before.
> 
> My usecase is to compare tools.  I have learned that avconv use of 
> libvpx is inferior to using vpxenc directly, and I became curious if 
> that was the case for their use of other libraries too.  That's how I 
> discovered this feature offered via libav but not opusenc.
> 
> Might be that Handbrake and some of the gazillion other transcoding 
> wrappers make use of the feature too.

Ok, so the short answer for your case then is most of the time you'd
just want the "more obvious" options from opusenc, but when you do
really need to tweak this to do a direct comparison of some special
mode from another tool you'd indeed use the --ctl option.

I believe that libav/ffmpeg does now have their own reimplementation
of a decoder (I'm not sure if we have that in Debian yet though),
but most of the quality related things are generally encoder side,
so if you do find any notable difference that would definitely be
something worth reporting, since it's likely to be "simple bug"
rather than some fundamental shortcoming somewhere.

One other thing to be aware of, depending on how you're doing your
comparisons, is that opusdec will dither by default, which improves
the audible quality of low level signals, but does raise the measured
noise floor.  You can turn that off if you need to.

  Cheers,
  Ron


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#763876: opus-tools: No way to set application

Reply via email to