>> Has anyone experienced odd artifacts while doing hybrid mixing and where >> sound >>files stored in lossy formats were converted to wav files? No, but I only use lossy compression when I have no other choice. Using lossless compression removes one possible source of error.
>> Are there file formats that should be avoided as far as psychoacoustic >> research >>goes? See answer to question below. >> Are all lossless formats more-or-less equal in terms of 'purity'? Did you mean to say "lossy", or lossless? Lossless compression is without error, assuming that basic assumptions of the coder are not violated. Some of the commercial processes have modes that are 'mostly' lossless. I a streaming application, such as on a blue-ray disc, there is some maximum cap on bitrate. If the audio source is highly random, that is to say that it's mostly noise, then the compressed data rate can be too large for the medium and then the codec is forced to enter into a lossy mode. This is highly unlikely to happen except for some high sample-rate cases where most of the bandwidth above 20 kHz is comprised of noise. Lossy compression is, of course, designed to make a representation of the original audio at some lower data rate. Depending on the codec and on the data rate the decoded copy of the audio has errors in it, errors which are intended to be inaudible. If the data rate is lower, then there is more error and the chances that it will be audible at any particular point in the recording are increased. You can fool most of the people most of the time. As the data rate is increased, the percentage of people who can't hear the artifacts approaches 100%. Also, the quality of any particular codec will depend on the implementation of it. Codecs are specified primarily by their decoders. That is, the decoder must be able to decode data streams with the specified syntax and tools. It has typically been the case that after the introduction of a new codec that the encoders continue to get better, so that the overall performance at a given data rate is better using some later encoder implementation than it was originally. All of this has been the subject of contentious debate for two decades. There are plenty of people with opinions but relatively few of them who have taken part in any controlled listening tests. Many individuals seem to have listened to a few bad examples and then made a long-term decision as to what works and what doesn't. As an example, MP3 at 96 kbps is likely to have audible artifacts depending on what is encoded, what the reproduction system is like, and who is listening. MP3 at 192 kbps is very unlikely to have audible artifacts, although certain tricky program material can cause audible artifacts. It thus becomes the case that one might wish to rate codecs in terms of bit rate necessary to achieve a given quality level. In that case a ranking might be AAC > DD > MP3 > DTS. All of them can achieve good quality but some of them > require a higher data rage to do it. If the audio is processed after the low bit rate compression some of the assumptions about masking made by the designers of the codec may be violated. This may result in artifacts being more audible than otherwise. Have a listen to the AES demo disc. http://www.aes.org/publications/technical/DigitalAudio.cfm I see that this disc is a different one than what I worked on several years ago, but it looks as though it should have some good stuff on it. One final comment. When I watch digital broadcasts, say the ones over DirecTV, I see continuous video artifacts yet I never hear audio artifacts. In situations with recorded or broadcast media one doesn't have knowledge of what the original program looked like or sounded like. But if I hear a sound that doesn't normally occur in audio, or if I see an artifact like macro-blocking or halos or the screen door effect, I know that wasn't in the source originally. Eric Benjamin (not to be confused with Eric Carmichael!) ----- Original Message ---- From: Eric Carmichel <e...@elcaudio.com> To: Sursound <sursound@music.vt.edu> Sent: Fri, April 5, 2013 5:52:36 PM Subject: [Sursound] Surround formats and lossy compression Greetings to All: When it comes to surround sound coding/decoding, I never make a peep because I'm ignorant on the topic. However, a friend who heads the Dept. of Audiology at a children's hospital had asked a question regarding MP3s. Although the MP3 format may be nothing more than a distant relative to surround formats, the thought of using "lossy" file types in research studies utilizing surround-sound stimuli does concern me. I answered my friend's question (re MP3s) as best I could, and the answer is shown below (I copied and pasted it verbatim--sorry for it's long length). Some of the concerns outlined below may or may not apply to surround sound (?). Has anyone experienced odd artifacts while doing hybrid mixing (sounds from monaural sources added to actual, or live, Ambisonic recordings) and where sound files stored in lossy formats were converted to wav files? Re surround sound for research: Are there file formats that should be avoided as far as psychoacoustic research goes? Are all lossless formats more-or-less equal in terms of 'purity'. Thanks in advance for any insights. Eric C. ---original email and response re MP3s and audiology follow--- Hi Eric – I hope you’re doing well. I’d like to pick your brain, if you don’t mind. What do you think about the use of MP3 or MP4 recordings for speech audiometry? I’m thinking of possible pitfalls in the compression and the bandwidth of the signal compared to, say, FLAC or standard wav files. Of course, audiologists used vinyl LPs and tape recordings for decades without any worry. Thanks, Bob Hi Bob, You ask a good question and one that should be examined from more than a “fidelity” point of view. But before I dive into this, please allow me to make my first disclaimer: I’m writing this off the cuff, so I won’t give any references to peer-reviewed studies (but then, who needs peer review when the answer comes from Eric Carmichel?). Second disclaimer: I assume you already know a lot of what I wrote below--if I explain something that is either “obvious” or well known, it’s only to help me communicate my thoughts. Researchers [ref?] have shown that the majority of listeners cannot tell the difference between a 44.1 kHz (or kS/s), 16-bit wav file and an MP3 derived from the same wav file. I don’t know what program material was used in the studies, but let’s assume music. If we can’t tell the difference between music MP3s and CDs, then “surely” we can’t hear a difference between speech MP3s and CDs. This might be one argument in favor of using MP3s for speech audiometry. I believe most MP3s have a 32 kS/s sampling rate, which isn’t by itself much of a size reduction from 44.1 kS/s files. The compression scheme used to create MP3s is (or was) proprietary and largely based on psychoacoustical principals. Sounds that can’t be heard because of energy masking are “removed” at the time they would otherwise be masked. MP3s, unlike FLAC (free lossless audio codec), use a “lossy” compression scheme--what is lost isn’t brought back--it just doesn’t contribute (perceptually) to the sound. I’d guess that both forward and backward masking are taken into account as well. The usual “bandwidth” (not sure it’s a good use of the word) for MP3s is 128 kilobytes/s, but some files use a variable bit rate*. An MP3 whose bit rate is 128 KBps is considered “radio” quality, while higher rates are probably indistinguishable from CD-quality wav files, at least in terms of fidelity. [Side notes: CD-quality refers to 44.1 kHz sampling rate and 16-bit resolution. 16 bits is, well, a mix-n-match of 16 zeroes and ones. MP3s are also 16 bit, but this is variable. Sixteen bits yields a total of 2^16 unique combinations of zero and one (0 to 65535 represented digitally). A byte is 8 bits--just basic computer nomenclature that goes back to caveman days and ASCII standards. So, 16 bits (lower case b) is same as two bytes (upper case B). If sampling rate is 32,000 samples per second and our resolution is 2 bytes, then we’re “streaming” 32,000 * 2 = 64,000 bytes per second, or 64 KBps. Unlike kilohertz (kHz), the K is capitalized when referencing kilobytes (KB) or kilobits (Kb). Because we’re (generally) dealing with two interleaved channels (L + R), the rate doubles. This is where the 128 KBps comes from: 32 kHz * 2 channels * 16 bits / 8 bits/byte = 128 KBps.] MP3s, like FLAC or wav files, are NOT limited to a fixed sample rate or bit depth. If frequency response were our only concern, Nyquist frequency (also known as “foldover” frequency) or Nyquist’s theory says that the highest reproducible frequency without aliasing is half of the sample rate. So, bit depth (= resolution) and upper frequency limit should NOT be our concern for using MP3s. So why be against MP3s? Read on... When it comes to perception, we really don’t know what the hearing impaired, autistic (neurotypical), or “not-so-average” listener hears. Perhaps the “missing” information in lossy compression schemes provides useful or subtle information to those whose perception isn’t normal or average. Furthermore, we don’t know how adding masking noise (speech or weighted noise) to material reproduced from MP3s might affect an outcome. Here’s an interesting experiment: Convert a stereo MP3 to wav (you’re not gaining anything... yet), flip the polarity of one channel (i.e., a 180 degree “phase” change but without moving the time line), and mix the channels to create a 50/50 mono mixdown. In many instances, you’ll hear odd artifacts that aren’t explained by simple phase cancellations. In other words, mixing the original source material (master tracks, not the MP3) down to mono won’t give rise to the artifacts. So, there’s something about the encoding or decoding that affects files in unpredictable ways when changing back to a “lossless” (e.g. wav) file. Because there are audiometric test protocols that rely on phase flipping or combining signal and noise, I’d most certainly avoid lossy compression schemes. If the tests were as simple as speech detection thresholds, I don’t foresee any harm in using MP3 files. But for differential diagnoses, research, etc., stick with lossless files, whether analog, digital, wav, or FLAC. In summary, my reasons for not recommending MP3s is that they are already “psychoacoustically tainted” and not the equivalent to actual stimuli even if perceived by normal-hearing listeners as equivalent. Frequency response isn’t the culprit. And with today’s technology, there’s very little reason to “conserve” memory in order to accommodate small speech (wav) files. *Additional notes MP3 processing may not entirely remove a sound that is otherwise masked; instead, the resolution, or bit depth, can be greatly reduced in instances of “unheard” sounds. Simply re-sampling a file to lower the sampling rate is a linear reduction. In other words, re-sampling a wav file from 44.1 kHz to 32 kHz gives a 32/44.1 = 0.726 reduction in file size. Discussions of re-sampling among audio geeks, by the way, gets into the ultra-boring topics of dithering (ever heard “dithering down” used by recording engineers?), noise shaping, filter types, blah blah, but only a small percentage of the people who like to toss these words around can do the math. I can’t state that I’ve generated a pure tone, saved it as a wav file, converted it to MP3 Pro, and then examine how many bits were actually being used to create the sinusoid. Opening the MP3 in a wav editor such as Sound Forge or Audition probably doesn’t allow one to “see” how the MP3 is being operated on in order to edit or play the file (again, MP3 compression is proprietary as well as lossy). Zooming in on the MP3 will probably reveal 16 bits per sample, yet the file reduction is considerable (approx 10x smaller than the wav file it was derived from). When it comes to bit depth and sample rate, one of the biggest reasons for not using mega-fidelity files (24-bit, 96 kS/s) isn’t one of memory allocation, but one of battery use. Yep, the processing power needed for super audio files is greater than for lower-fidelity files. Apple (so I’m told) limits sample rate based on power consumption, not memory used. If you really want to open a can of worms, get the audio geeks to argue over 16- and 24-bit audio files. I put more merit in bit depth than sampling rate, but mostly for reasons having to do with dynamic range. Lossless compression codecs require processing power, too, but unless you’re doing audiometry in the field and power is at a premium, there shouldn’t be any problems regarding power. There may be an intrinsic latency when presenting material, but this would be on the order of milliseconds (or microseconds). Latency would only be a problem if other time-sensitive processing was involved (e.g, the use of VST or RTAS plug-ins for research). I really can’t think of practical reasons not to use FLAC files. A lot of this gets back to the quality of the master material, and what software was used to convert to FLAC or whatever. Hope this isn’t too confusing. Best, Eric C. -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.music.vt.edu/mailman/private/sursound/attachments/20130405/0e18fb5b/attachment.html> _______________________________________________ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound _______________________________________________ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound