Hello Sampo, I always appreciate your suggestions, insight, and (occasional) provocative comments. Here’s a tiny bit on info that may shed light on why I do things in ‘squirrelly’ fashion (and my Ambisonic recording of a chattering squirrel ain’t nothin’ compared to this diatribe). My doc studies began at what was considered to be the number one cochlear implant (CI) research lab in the world. (Trust me: I wasn’t the one doing the bragging or lab ranking.) Upon my arrival, I was surprised to observe the age-old listening condition of speech in one speaker and noise in another speaker was still the standard way of studying speech in noise. A simple two-speaker arrangement was adequate for earlier CI studies, and is probably adequate for studies involving unilateral implantation. But for studies involving bilateral implantation (localization, anyone?) and electro acoustic hearing (hybrid devices), it seemed that a more ‘realistic’ listening environment should be standardized across laboratories.
The surround system chosen by several research facilities is a system known as R-Space (see revitronix.com for info). The R-Space system was designed to fit in a standard audiometric test booth. They certainly managed this, and the R-Space has its merits. But the radius of its 8-speaker circular array is a mere 2 feet (0.61 m), so even slight head movements could change the relative sound levels at a listener’s ears. The R-Space’s background noise stimulus was recorded using eight equally-spaced Sennheiser gradient (or shotgun) mics. At least one photo of the recording setup shows a KEMAR centered in the 8-mic array (all mics on a horizontal plane), but this photo was for show. The KEMAR was neither needed nor used during the recording session. I’m not knocking the R-Space: It was designed for a specific application, and it uniquely fits in a tight space. Its main limitation (aside from the weenie speaker radius) is the number of available recordings. At the time the system was installed at one facility, there was only the 8 (discrete) channel recording of Lou Malnati’s Pizzeria. It was my belief that more diverse stimuli could be generated, and that a system less sensitive to head movement would be of value. (Note to proponents of binaural recordings, head tracking, and HRTFs: Headphones are out of the question because they don’t fit over hearing aids and cochlear implant processors.) One study utilizing the R-Space provided ‘scientific proof’ that the background noise, as played through the R-Space, could be used to demonstrate real-world differences between a hearing aid’s omni and directional mic settings (or something like this--Dr. Compton-Conley’s doctoral dissertation can be found on the Revitronix website). The R-Space has since been used in other studies, but I believe the original noise stimulus has been ‘bastardized’ in such a way as to make the external validity of some studies questionable. For example, studies have shown such-and-such speech comprehension scores using a +15 dB SNR. What you have to read between the lines is that the background noise was a recording of a pizzeria, but the noise was being presented at 60 dBA (a rather quiet pizzeria for Chicago!). In a different study, the background noise was presented at its recommended SPL (= 70 dBA) but the speech stimuli was presented at 85 dBA (an unrealistically loud talker!). My goal was to find a ‘better’ way to present the speech and noise, and to create a larger library of purposeful background-noise scenarios. Noise environments would include a quiet coffee house and a noisy airport terminal. This is where my journey into auralization and, subsequently, Ambisonics, began. Believe me, I make no bones about being a novice at auralization and Ambisonics. The R-Space is more than a set of JBL speakers and an 8-channel recording: It includes a MOTU FireWire interface, a Mac computer with external Glyph hard drive, a compact 8-channel power amp (QSC, I recall), and MOTU’s Digital Performer DAW (sort-of overkill if it’s sole purpose was to play 8 tracks of pre-recorded audio!). The 8 channels came pre-assigned to their respective tracks, and the session wasn’t really meant to modified (perhaps the reason R-Space used SF2 files in lieu of wav files). The university I attended had the idea that a few lines of code, or a program written in JAVA or Python (because it’s free), could be used in conjunction with Digital Performer. The idea here was to present monaural speech stimuli in the same way and with the same interface they were accustomed to using. But commercial DAW software is ‘bullet proof’ (or idiot proof) and for good reason--to prevent novices and hackers from crashing computers (or keeping them from attempting to reverse-engineer proprietary software?). Furthermore, MIDI-based software doesn’t seem to communicate well with other MIDI applications (too easy to create positive feedback loops?), at least not without the help of ReWire. However, MIDI-based DAWs communicate perfectly well with MIDI hardware; hence my initial impetus to use a DAW controller. Interestingly, the university I attended spent a good deal of energy poo-pooing their newly purchased DAW simply because it didn’t do what they magically wanted it to do. Nobody was interested in learning how to use Digital Performer or any other DAW. I was offered 50 Canadian bucks towards travel expenses to a conference in return for several hundred hours of writing code. The code or stand-alone program was to make the R-Space system do things exactly as it did before: Present one sentence at a time, but in the background of restaurant noise. Well, I don’t work for free (especially when a project is likely to fail), and such a ridiculous notion compelled me to use the ‘f’ word--something I rarely do. Fortunately, only a small minority of persons I encountered in academia expected others to be their personal slaves. There are researchers, students, and professors who I’m willing to bend over backwards for because they’re sincere and nice people. I also know that many of these individuals are not DAW literate, nor do they have experience using MIDI hardware devices or software. They’re not media production experts. The mention of ReWire would simply confuse them, and setting up multi-track busses, aux sends, effect returns, VSTs, etc. is not in their vocabulary. They’re comfortable with MATLAB, but not with Pro Tools. As a person with an electronics background (and a recording of a squirrel), I have to see things from their perspective. This brings me to the topic of DAWs, chips (Burr-Brown analog versus Cirrus Logic digital), and the like... To make a user-friendly system that presents stimuli, records responses, automatically adjusts settings, blah blah, I have look at the end-user. That often means designing a system with an ON/OFF switch, a simple boot sequence (and no ReWire), and a drop-down menu of standardized sentence lists to be used as the speech stimuli (choice of IEEE sentences, CNC sentences, etc.) and a Start/Stop button. The complexity and sophistication of the system has to be invisible to the user. Any system that requires the user to make changes on the fly is asking for trouble. Sometimes the presentation levels have to be displayed in units that aren’t standard in audio production (e.g. dBu) but are common in audiology (e.g., dB HL). More importantly, the presentation levels as displayed on the computer screen have to match the actual levels at the listening position--this, fortunately, is the easy part unless someone tampers with the hardware or software. Hardware devices are less likely to be tampered with, and unlike the purely analog days of old, calibration settings for mixed-signal devices are less likely to drift with age. My main caveat with hardware is cost, but my home-brew hardware devices integrate nicely with MIDI software. USB and FireWire replace the traditional MIDI ports (conversion from one data type to another is invisible to the user), so setup is simple for the end-user. It’s also fair to state that I’m quite comfortable with the hardware implementation of digital and analog signal processing. I use hardware a lot because that’s what I grew up with. I use microcontrollers and PIC chips, too, but I can still do a lot with TTL gates, Karnaugh maps, and Hardware Description Language. I suppose I’m old school in many ways, but always willing to learn and apply new technologies. I read a lot and build modern gadgets from kits. Help and suggestions are always welcome--I'm also learn from making mistakes, and I'm not afraid to admit that I'm wrong when shown a better way of doing things. This concludes the first part of my Saga. In the next chapter I’ll address typical speech-test background noise, and why I choose Ambisonic over a multi-talker surround of cocktail speech derived from monaural sources (as well as other types of surround noise). Because I did the mastering for a number of widely-used speech-in-noise tests, I have a good idea of what’s being used by CI researchers. Til next time, E -------------- next part -------------- An HTML attachment was scrubbed... URL: <https://mail.music.vt.edu/mailman/private/sursound/attachments/20121014/8b6f2a59/attachment.html> _______________________________________________ Sursound mailing list Sursound@music.vt.edu https://mail.music.vt.edu/mailman/listinfo/sursound