[Sursound] E's Sursound Saga, Part I--Why what I do wrong works

Eric Carmichel Sun, 14 Oct 2012 02:33:59 -0700

Hello Sampo,
I always appreciate your suggestions, insight, and (occasional) provocative 
comments. Here’s a tiny bit on info that may shed light on why I do things in 
‘squirrelly’ fashion (and my Ambisonic recording of a chattering squirrel ain’t 
nothin’ compared to this diatribe).
My doc studies began at what was considered to be the number one cochlear 
implant (CI) research lab in the world. (Trust me: I wasn’t the one doing the 
bragging or lab ranking.) Upon my arrival, I was surprised to observe the 
age-old listening condition of speech in one speaker and noise in another 
speaker was still the standard way of studying speech in noise. A simple 
two-speaker arrangement was adequate for earlier CI studies, and is probably 
adequate for studies involving unilateral implantation. But for studies 
involving bilateral implantation (localization, anyone?) and electro acoustic 
hearing (hybrid devices), it seemed that a more ‘realistic’ listening 
environment should be standardized across laboratories.

The surround system chosen by several research facilities is a system known as
R-Space (see revitronix.com for info). The R-Space system was designed to fit
in a standard audiometric test booth. They certainly managed this, and the
R-Space has its merits. But the radius of its 8-speaker circular array is a
mere 2 feet (0.61 m), so even slight head movements could change the relative
sound levels at a listener’s ears. The R-Space’s background noise stimulus was
recorded using eight equally-spaced Sennheiser gradient (or shotgun) mics. At
least one photo of the recording setup shows a KEMAR centered in the 8-mic
array (all mics on a horizontal plane), but this photo was for show. The KEMAR
was neither needed nor used during the recording session.
I’m not knocking the R-Space: It was designed for a specific application, and
it uniquely fits in a tight space. Its main limitation (aside from the weenie
speaker radius) is the number of available recordings. At the time the system
was installed at one facility, there was only the 8 (discrete) channel
recording of Lou Malnati’s Pizzeria. It was my belief that more diverse stimuli
could be generated, and that a system less sensitive to head movement would be
of value. (Note to proponents of binaural recordings, head tracking, and HRTFs:
Headphones are out of the question because they don’t fit over hearing aids and
cochlear implant processors.)

One study utilizing the R-Space provided ‘scientific proof’ that the background
noise, as played through the R-Space, could be used to demonstrate real-world
differences between a hearing aid’s omni and directional mic settings (or
something like this--Dr. Compton-Conley’s doctoral dissertation can be found on
the Revitronix website). The R-Space has since been used in other studies, but
I believe the original noise stimulus has been ‘bastardized’ in such a way as
to make the external validity of some studies questionable. For example,
studies have shown such-and-such speech comprehension scores using a +15 dB
SNR. What you have to read between the lines is that the background noise was a
recording of a pizzeria, but the noise was being presented at 60 dBA (a rather
quiet pizzeria for Chicago!). In a different study, the background noise was
presented at its recommended SPL (= 70 dBA) but the speech stimuli was
presented at 85 dBA (an
unrealistically loud talker!). My goal was to find a ‘better’ way to present
the speech and noise, and to create a larger library of purposeful
background-noise scenarios. Noise environments would include a quiet coffee
house and a noisy airport terminal. This is where my journey into auralization
and, subsequently, Ambisonics, began. Believe me, I make no bones about being a
novice at auralization and Ambisonics.

The R-Space is more than a set of JBL speakers and an 8-channel recording: It
includes a MOTU FireWire interface, a Mac computer with external Glyph hard
drive, a compact 8-channel power amp (QSC, I recall), and MOTU’s Digital
Performer DAW (sort-of overkill if it’s sole purpose was to play 8 tracks of
pre-recorded audio!). The 8 channels came pre-assigned to their respective
tracks, and the session wasn’t really meant to modified (perhaps the reason
R-Space used SF2 files in lieu of wav files). The university I attended had the
idea that a few lines of code, or a program written in JAVA or Python (because
it’s free), could be used in conjunction with Digital Performer. The idea here
was to present monaural speech stimuli in the same way and with the same
interface they were accustomed to using. But commercial DAW software is ‘bullet
proof’ (or idiot proof) and for good reason--to prevent novices and hackers
from crashing computers (or
keeping them from attempting to reverse-engineer proprietary software?).
Furthermore, MIDI-based software doesn’t seem to communicate well with other
MIDI applications (too easy to create positive feedback loops?), at least not
without the help of ReWire. However, MIDI-based DAWs communicate perfectly well
with MIDI hardware; hence my initial impetus to use a DAW controller.
Interestingly, the university I attended spent a good deal of energy poo-pooing
their newly purchased DAW simply because it didn’t do what they magically
wanted it to do. Nobody was interested in learning how to use Digital Performer
or any other DAW. I was offered 50 Canadian bucks towards travel expenses to a
conference in return for several hundred hours of writing code. The code or
stand-alone program was to make the R-Space system do things exactly as it did
before: Present one sentence at a time, but in the background of restaurant
noise. Well, I don’t work for free
(especially when a project is likely to fail), and such a ridiculous notion
compelled me to use the ‘f’ word--something I rarely do. Fortunately, only a
small minority of persons I encountered in academia expected others to be their
personal slaves.

There are researchers, students, and professors who I’m willing to bend over
backwards for because they’re sincere and nice people. I also know that many of
these individuals are not DAW literate, nor do they have experience using MIDI
hardware devices or software. They’re not media production experts. The mention
of ReWire would simply confuse them, and setting up multi-track busses, aux
sends, effect returns, VSTs, etc. is not in their vocabulary. They’re
comfortable with MATLAB, but not with Pro Tools. As a person with an
electronics background (and a recording of a squirrel), I have to see things
from their perspective. This brings me to the topic of DAWs, chips (Burr-Brown
analog versus Cirrus Logic digital), and the like...

To make a user-friendly system that presents stimuli, records responses,
automatically adjusts settings, blah blah, I have look at the end-user. That
often means designing a system with an ON/OFF switch, a simple boot sequence
(and no ReWire), and a drop-down menu of standardized sentence lists to be used
as the speech stimuli (choice of IEEE sentences, CNC sentences, etc.) and a
Start/Stop button. The complexity and sophistication of the system has to be
invisible to the user. Any system that requires the user to make changes on the
fly is asking for trouble. Sometimes the presentation levels have to be
displayed in units that aren’t standard in audio production (e.g. dBu) but are
common in audiology (e.g., dB HL). More importantly, the presentation levels as
displayed on the computer screen have to match the actual levels at the
listening position--this, fortunately, is the easy part unless someone tampers
with the hardware or software. Hardware
devices are less likely to be tampered with, and unlike the purely analog days
of old, calibration settings for mixed-signal devices are less likely to drift
with age. My main caveat with hardware is cost, but my home-brew hardware
devices integrate nicely with MIDI software. USB and FireWire replace the
traditional MIDI ports (conversion from one data type to another is invisible
to the user), so setup is simple for the end-user.

It’s also fair to state that I’m quite comfortable with the hardware
implementation of digital and analog signal processing. I use hardware a lot
because that’s what I grew up with. I use microcontrollers and PIC chips, too,
but I can still do a lot with TTL gates, Karnaugh maps, and Hardware
Description Language. I suppose I’m old school in many ways, but always willing
to learn and apply new technologies. I read a lot and build modern gadgets from
kits. Help and suggestions are always welcome--I'm also learn from making
mistakes, and I'm not afraid to admit that I'm wrong when shown a better way of
doing things.

This concludes the first part of my Saga. In the next chapter I’ll address
typical speech-test background noise, and why I choose Ambisonic over a
multi-talker surround of cocktail speech derived from monaural sources (as well
as other types of surround noise). Because I did the mastering for a number of
widely-used speech-in-noise tests, I have a good idea of what’s being used by
CI researchers.
Til next time,
E
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20121014/8b6f2a59/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

[Sursound] E's Sursound Saga, Part I--Why what I do wrong works

Reply via email to