Hello Sampo,
I always appreciate your suggestions, insight, and (occasional) provocative 
comments. Here’s a tiny bit on info that may shed light on why I do things in 
‘squirrelly’ fashion (and my Ambisonic recording of a chattering squirrel ain’t 
nothin’ compared to this diatribe).
My doc studies began at what was considered to be the number one cochlear 
implant (CI) research lab in the world. (Trust me: I wasn’t the one doing the 
bragging or lab ranking.) Upon my arrival, I was surprised to observe the 
age-old listening condition of speech in one speaker and noise in another 
speaker was still the standard way of studying speech in noise. A simple 
two-speaker arrangement was adequate for earlier CI studies, and is probably 
adequate for studies involving unilateral implantation. But for studies 
involving bilateral implantation (localization, anyone?) and electro acoustic 
hearing (hybrid devices), it seemed that a more ‘realistic’ listening 
environment should be standardized across laboratories.

The surround system chosen by several research facilities is a system known as 
R-Space (see revitronix.com for info). The R-Space system was designed to fit 
in a standard audiometric test booth. They certainly managed this, and the 
R-Space has its merits. But the radius of its 8-speaker circular array is a 
mere 2 feet (0.61 m), so even slight head movements could change the relative 
sound levels at a listener’s ears. The R-Space’s background noise stimulus was 
recorded using eight equally-spaced Sennheiser gradient (or shotgun) mics. At 
least one photo of the recording setup shows a KEMAR centered in the 8-mic 
array (all mics on a horizontal plane), but this photo was for show. The KEMAR 
was neither needed nor used during the recording session.
I’m not knocking the R-Space: It was designed for a specific application, and 
it uniquely fits in a tight space. Its main limitation (aside from the weenie 
speaker radius) is the number of available recordings. At the time the system 
was installed at one facility, there was only the 8 (discrete) channel 
recording of Lou Malnati’s Pizzeria. It was my belief that more diverse stimuli 
could be generated, and that a system less sensitive to head movement would be 
of value. (Note to proponents of binaural recordings, head tracking, and HRTFs: 
Headphones are out of the question because they don’t fit over hearing aids and 
cochlear implant processors.)

One study utilizing the R-Space provided ‘scientific proof’ that the background 
noise, as played through the R-Space, could be used to demonstrate real-world 
differences between a hearing aid’s omni and directional mic settings (or 
something like this--Dr. Compton-Conley’s doctoral dissertation can be found on 
the Revitronix website). The R-Space has since been used in other studies, but 
I believe the original noise stimulus has been ‘bastardized’ in such a way as 
to make the external validity of some studies questionable. For example, 
studies have shown such-and-such speech comprehension scores using a +15 dB 
SNR. What you have to read between the lines is that the background noise was a 
recording of a pizzeria, but the noise was being presented at 60 dBA (a rather 
quiet pizzeria for Chicago!). In a different study, the background noise was 
presented at its recommended SPL (= 70 dBA) but the speech stimuli was 
presented at 85 dBA (an
 unrealistically loud talker!). My goal was to find a ‘better’ way to present 
the speech and noise, and to create a larger library of purposeful 
background-noise scenarios. Noise environments would include a quiet coffee 
house and a noisy airport terminal. This is where my journey into auralization 
and, subsequently, Ambisonics, began. Believe me, I make no bones about being a 
novice at auralization and Ambisonics.

The R-Space is more than a set of JBL speakers and an 8-channel recording: It 
includes a MOTU FireWire interface, a Mac computer with external Glyph hard 
drive, a compact 8-channel power amp (QSC, I recall), and MOTU’s Digital 
Performer DAW (sort-of overkill if it’s sole purpose was to play 8 tracks of 
pre-recorded audio!). The 8 channels came pre-assigned to their respective 
tracks, and the session wasn’t really meant to modified (perhaps the reason 
R-Space used SF2 files in lieu of wav files). The university I attended had the 
idea that a few lines of code, or a program written in JAVA or Python (because 
it’s free), could be used in conjunction with Digital Performer. The idea here 
was to present monaural speech stimuli in the same way and with the same 
interface they were accustomed to using. But commercial DAW software is ‘bullet 
proof’ (or idiot proof) and for good reason--to prevent novices and hackers 
from crashing computers (or
 keeping them from attempting to reverse-engineer proprietary software?). 
Furthermore, MIDI-based software doesn’t seem to communicate well with other 
MIDI applications (too easy to create positive feedback loops?), at least not 
without the help of ReWire. However, MIDI-based DAWs communicate perfectly well 
with MIDI hardware; hence my initial impetus to use a DAW controller. 
Interestingly, the university I attended spent a good deal of energy poo-pooing 
their newly purchased DAW simply because it didn’t do what they magically 
wanted it to do. Nobody was interested in learning how to use Digital Performer 
or any other DAW. I was offered 50 Canadian bucks towards travel expenses to a 
conference in return for several hundred hours of writing code. The code or 
stand-alone program was to make the R-Space system do things exactly as it did 
before: Present one sentence at a time, but in the background of restaurant 
noise. Well, I don’t work for free
 (especially when a project is likely to fail), and such a ridiculous notion 
compelled me to use the ‘f’ word--something I rarely do. Fortunately, only a 
small minority of persons I encountered in academia expected others to be their 
personal slaves.

There are researchers, students, and professors who I’m willing to bend over 
backwards for because they’re sincere and nice people. I also know that many of 
these individuals are not DAW literate, nor do they have experience using MIDI 
hardware devices or software. They’re not media production experts. The mention 
of ReWire would simply confuse them, and setting up multi-track busses, aux 
sends, effect returns, VSTs, etc. is not in their vocabulary. They’re 
comfortable with MATLAB, but not with Pro Tools. As a person with an 
electronics background (and a recording of a squirrel), I have to see things 
from their perspective. This brings me to the topic of DAWs, chips (Burr-Brown 
analog versus Cirrus Logic digital), and the like...

To make a user-friendly system that presents stimuli, records responses, 
automatically adjusts settings, blah blah, I have look at the end-user. That 
often means designing a system with an ON/OFF switch, a simple boot sequence 
(and no ReWire), and a drop-down menu of standardized sentence lists to be used 
as the speech stimuli (choice of IEEE sentences, CNC sentences, etc.) and a 
Start/Stop button. The complexity and sophistication of the system has to be 
invisible to the user. Any system that requires the user to make changes on the 
fly is asking for trouble. Sometimes the presentation levels have to be 
displayed in units that aren’t standard in audio production (e.g. dBu) but are 
common in audiology (e.g., dB HL). More importantly, the presentation levels as 
displayed on the computer screen have to match the actual levels at the 
listening position--this, fortunately, is the easy part unless someone tampers 
with the hardware or software. Hardware
 devices are less likely to be tampered with, and unlike the purely analog days 
of old, calibration settings for mixed-signal devices are less likely to drift 
with age. My main caveat with hardware is cost, but my home-brew hardware 
devices integrate nicely with MIDI software. USB and FireWire replace the 
traditional MIDI ports (conversion from one data type to another is invisible 
to the user), so setup is simple for the end-user.

It’s also fair to state that I’m quite comfortable with the hardware 
implementation of digital and analog signal processing. I use hardware a lot 
because that’s what I grew up with. I use microcontrollers and PIC chips, too, 
but I can still do a lot with TTL gates, Karnaugh maps, and Hardware 
Description Language. I suppose I’m old school in many ways, but always willing 
to learn and apply new technologies. I read a lot and build modern gadgets from 
kits. Help and suggestions are always welcome--I'm also learn from making 
mistakes, and I'm not afraid to admit that I'm wrong when shown a better way of 
doing things.

This concludes the first part of my Saga. In the next chapter I’ll address 
typical speech-test background noise, and why I choose Ambisonic over a 
multi-talker surround of cocktail speech derived from monaural sources (as well 
as other types of surround noise). Because I did the mastering for a number of 
widely-used speech-in-noise tests, I have a good idea of what’s being used by 
CI researchers.
Til next time,
E
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20121014/8b6f2a59/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to