Re: [Sursound] Surround formats and lossy compression

Eric Benjamin Fri, 05 Apr 2013 18:32:08 -0700

>> Has anyone experienced odd artifacts while doing hybrid mixing and where 
>> sound 
>>files stored in lossy formats were converted to wav files?
No, but I only use lossy compression when I have no other choice.  Using 
lossless compression removes one possible source of error.


>> Are there file formats that should be avoided as far as psychoacoustic 
>> research 
>>goes?
See answer to question below.

>> Are all lossless formats more-or-less equal in terms of 'purity'?
Did you mean to say "lossy", or lossless?

Lossless compression is without error, assuming that basic assumptions of the 
coder are not violated.  Some of the commercial processes have modes that are 
'mostly' lossless.  I a streaming application, such as on a blue-ray disc, 
there 
is some maximum cap on bitrate.  If the audio source is highly random, that is 
to say that it's mostly noise, then the compressed data rate can be too large 
for the medium and then the codec is forced to enter into a lossy mode.  This 
is 
highly unlikely to happen except for some high sample-rate cases where most of 
the bandwidth above 20 kHz is comprised of noise.

Lossy compression is, of course, designed to make a representation of the 
original audio at some lower data rate.  Depending on the codec and on the data 
rate the decoded copy of the audio has errors in it, errors which are intended 
to be inaudible.  If the data rate is lower, then there is more error and the 
chances that it will be audible at any particular point in the recording are 
increased.  You can fool most of the people most of the time.  As the data rate 
is increased, the percentage of people who can't hear the artifacts approaches 
100%.  Also, the quality of any particular codec will depend on 
the implementation of it.  Codecs are specified primarily by their decoders. 
 That is, the decoder must be able to decode data streams with the specified 
syntax and tools.  It has typically been the case that after the introduction 
of 
a new codec that the encoders continue to get better, so that the overall 
performance at a given data rate is better using some later 
encoder implementation than it was originally.

All of this has been the subject of contentious debate for two decades.  There 
are plenty of people with opinions but relatively few of them who have taken 
part in any controlled listening tests.  Many individuals seem to have listened 
to a few bad examples and then made a long-term decision as to what works and 
what doesn't.   As an example, MP3 at 96 kbps is likely to have audible 
artifacts depending on what is encoded, what the reproduction system is like, 
and who is listening.  MP3 at 192 kbps is very unlikely to have audible 
artifacts, although certain tricky program material can cause audible artifacts.

It thus becomes the case that one might wish to rate codecs in terms of bit 
rate 
necessary to achieve a given quality level.  In that case a ranking might be 
AAC 
> DD > MP3 > DTS.  All of them can achieve good quality but some of them 
> require 
a higher data rage to do it.

If the audio is processed after the low bit rate compression some of the 
assumptions about masking made by the designers of the codec may be violated. 
 This may result in artifacts being more audible than otherwise.  Have a listen 
to the AES demo disc.

http://www.aes.org/publications/technical/DigitalAudio.cfm

I see that this disc is a different one than what I worked on several years 
ago, 
but it looks as though it should have some good stuff on it.

One final comment.  When I watch digital broadcasts, say the ones over DirecTV, 
I see continuous video artifacts yet I never hear audio artifacts.  In 
situations with recorded or broadcast media one doesn't have knowledge of what 
the original program looked like or sounded like.  But if I hear a sound that 
doesn't normally occur in audio, or if I see an artifact like macro-blocking 
or halos or the screen door effect, I know that wasn't in the source originally.

Eric Benjamin (not to be confused with Eric Carmichael!)


----- Original Message ----
From: Eric Carmichel <e...@elcaudio.com>
To: Sursound <sursound@music.vt.edu>
Sent: Fri, April 5, 2013 5:52:36 PM
Subject: [Sursound] Surround formats and lossy compression

Greetings to All:

When it comes to surround sound coding/decoding, I never make a peep because 
I'm 
ignorant on the topic. However, a friend who heads the Dept. of Audiology at a 
children's hospital had asked a question regarding MP3s. Although the MP3 
format 
may be nothing more than a distant relative to surround formats, the thought of 
using "lossy" file types in research studies utilizing surround-sound stimuli 
does concern me. I answered my friend's question (re MP3s) as best I could, and 
the answer is shown below (I copied and pasted it verbatim--sorry for it's long 
length). Some of the concerns outlined below may or may not apply to surround 
sound (?).

Has anyone experienced odd artifacts while doing hybrid mixing (sounds from 
monaural sources added to actual, or live, Ambisonic recordings) and where 
sound 
files stored in lossy formats were converted to wav files? Re surround sound 
for 
research: Are there file formats that should be avoided as far as 
psychoacoustic 
research goes? Are all lossless formats more-or-less equal in terms of 'purity'.

Thanks in advance for any insights.
Eric C.


---original email and response re MP3s and audiology follow---

Hi Eric –
I hope you’re doing well. I’d like to pick your brain, if you don’t mind. What 
do you think about the use of MP3 or MP4 recordings for speech audiometry? I’m 
thinking of possible pitfalls in the compression and the bandwidth of the 
signal 
compared to, say, FLAC or standard wav files. Of course, audiologists used 
vinyl 
LPs and tape recordings for decades without any worry. Thanks,
Bob

Hi Bob,

You ask a good question and one that should be examined from more than a 
“fidelity” point of view. But before I dive into this, please allow me to make 
my first disclaimer: I’m writing this off the cuff, so I won’t give any 
references to peer-reviewed studies (but then, who needs peer review when the 
answer comes from Eric Carmichel?). Second disclaimer: I assume you already 
know 
a lot of what I wrote below--if I explain something that is either “obvious” or 
well known, it’s only to help me communicate my thoughts.

Researchers [ref?] have shown that the majority of listeners cannot tell the 
difference between a 44.1 kHz (or kS/s), 16-bit wav file and an MP3 derived 
from 
the same wav file. I don’t know what program material was used in the studies, 
but let’s assume music. If we can’t tell the difference between music MP3s and 
CDs, then “surely” we can’t hear a difference between speech MP3s and CDs. This 
might be one argument in favor of using MP3s for speech audiometry.

I believe most MP3s have a 32 kS/s sampling rate, which isn’t by itself much of 
a size reduction from 44.1 kS/s files. The compression scheme used to create 
MP3s is (or was) proprietary and largely based on psychoacoustical principals. 
Sounds that can’t be heard because of energy masking are “removed” at the time 
they would otherwise be masked. MP3s, unlike FLAC (free lossless audio codec), 
use a “lossy” compression scheme--what is lost isn’t brought back--it just 
doesn’t contribute (perceptually) to the sound. I’d guess that both forward and 
backward masking are taken into account as well. The usual “bandwidth” (not 
sure 
it’s a good use of the word) for MP3s is 128 kilobytes/s, but some files use a 
variable bit rate*. An MP3 whose bit rate is 128 KBps is considered “radio” 
quality, while higher rates are probably indistinguishable from CD-quality wav 
files, at least in terms of fidelity.

[Side notes: CD-quality refers to 44.1 kHz sampling rate and 16-bit resolution. 
16 bits is, well, a mix-n-match of 16 zeroes and ones. MP3s are also 16 bit, 
but 
this is variable. Sixteen bits yields a total of 2^16 unique combinations of 
zero and one (0 to 65535 represented digitally). A byte is 8 bits--just basic 
computer nomenclature that goes back to caveman days and ASCII standards. So, 
16 
bits (lower case b) is same as two bytes (upper case B). If sampling rate is 
32,000 samples per second and our resolution is 2 bytes, then we’re “streaming” 
32,000 * 2 = 64,000 bytes per second, or 64 KBps. Unlike kilohertz (kHz), the K 
is capitalized when referencing kilobytes (KB) or kilobits (Kb). Because we’re 
(generally) dealing with two interleaved channels (L + R), the rate doubles. 
This is where the 128 KBps comes from: 32 kHz * 2 channels * 16 bits / 8 
bits/byte = 128 KBps.]

MP3s, like FLAC or wav files, are NOT limited to a fixed sample rate or bit 
depth. If frequency response were our only concern, Nyquist frequency (also 
known as “foldover” frequency) or Nyquist’s theory says that the highest 
reproducible frequency without aliasing is half of the sample rate. So, bit 
depth (= resolution) and upper frequency limit should NOT be our concern for 
using MP3s. So why be against MP3s? Read on...

When it comes to perception, we really don’t know what the hearing impaired, 
autistic (neurotypical), or “not-so-average” listener hears. Perhaps the 
“missing” information in lossy compression schemes provides useful or subtle 
information to those whose perception isn’t normal or average. Furthermore, we 
don’t know how adding masking noise (speech or weighted noise) to material 
reproduced from MP3s might affect an outcome. Here’s an interesting experiment: 
Convert a stereo MP3 to wav (you’re not gaining anything... yet), flip the 
polarity of one channel (i.e., a 180 degree “phase” change but without moving 
the time line), and mix the channels to create a 50/50 mono mixdown. In many 
instances, you’ll hear odd artifacts that aren’t explained by simple phase 
cancellations. In other words, mixing the original source material (master 
tracks, not the MP3) down to mono won’t give rise to the artifacts. So, there’s 
something about
the encoding or decoding that affects files in unpredictable ways when changing 
back to a “lossless” (e.g. wav) file. Because there are audiometric test 
protocols that rely on phase flipping or combining signal and noise, I’d most 
certainly avoid lossy compression schemes. If the tests were as simple as 
speech 
detection thresholds, I don’t foresee any harm in using MP3 files. But for 
differential diagnoses, research, etc., stick with lossless files, whether 
analog, digital, wav, or FLAC.

In summary, my reasons for not recommending MP3s is that they are already 
“psychoacoustically tainted” and not the equivalent to actual stimuli even if 
perceived by normal-hearing listeners as equivalent. Frequency response isn’t 
the culprit. And with today’s technology, there’s very little reason to 
“conserve” memory in order to accommodate small speech (wav) files.

*Additional notes

MP3 processing may not entirely remove a sound that is otherwise masked; 
instead, the resolution, or bit depth, can be greatly reduced in instances of 
“unheard” sounds. Simply re-sampling a file to lower the sampling rate is a 
linear reduction. In other words, re-sampling a wav file from 44.1 kHz to 32 
kHz 
gives a 32/44.1 = 0.726 reduction in file size. Discussions of re-sampling 
among 
audio geeks, by the way, gets into the ultra-boring topics of dithering (ever 
heard “dithering down” used by recording engineers?), noise shaping, filter 
types, blah blah, but only a small percentage of the people who like to toss 
these words around can do the math.

I can’t state that I’ve generated a pure tone, saved it as a wav file, 
converted 
it to MP3 Pro, and then examine how many bits were actually being used to 
create 
the sinusoid. Opening the MP3 in a wav editor such as Sound Forge or Audition 
probably doesn’t allow one to “see” how the MP3 is being operated on in order 
to 
edit or play the file (again, MP3 compression is proprietary as well as lossy). 
Zooming in on the MP3 will probably reveal 16 bits per sample, yet the file 
reduction is considerable (approx 10x smaller than the wav file it was derived 
from).

When it comes to bit depth and sample rate, one of the biggest reasons for not 
using mega-fidelity files (24-bit, 96 kS/s) isn’t one of memory allocation, but 
one of battery use. Yep, the processing power needed for super audio files is 
greater than for lower-fidelity files. Apple (so I’m told) limits sample rate 
based on power consumption, not memory used. If you really want to open a can 
of 
worms, get the audio geeks to argue over 16- and 24-bit audio files. I put more 
merit in bit depth than sampling rate, but mostly for reasons having to do with 
dynamic range.

Lossless compression codecs require processing power, too, but unless you’re 
doing audiometry in the field and power is at a premium, there shouldn’t be any 
problems regarding power. There may be an intrinsic latency when presenting 
material, but this would be on the order of milliseconds (or microseconds). 
Latency would only be a problem if other time-sensitive processing was involved 
(e.g, the use of VST or RTAS plug-ins for research). I really can’t think of 
practical reasons not to use FLAC files. A lot of this gets back to the quality 
of the master material, and what software was used to convert to FLAC or 
whatever.

Hope this isn’t too confusing.

Best,
Eric C.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20130405/0e18fb5b/attachment.html>

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Re: [Sursound] Surround formats and lossy compression

Reply via email to