[Sursound] Ghost in Machine

Eric Carmichel Fri, 14 Dec 2012 11:11:48 -0800

Greetings to All,

I've been working on listening samples to help explain my "ideas" regarding 
hearing aid and cochlear implant research to others. For starters, I'm using 
IRs obtained with a Soundfield mic to auralize dry speech. Unfortunately, more 
questions than sounds surround me weary head.

I discovered an artifact that will need to be addressed, and the answer may be
obvious to the experts out there. I have uploaded files so that everyone can
here the artifacts. The files can be downloaded from www.elcaudio.com/demos/

The dry recording I had initially planned as a demo is titled
janice_sample_condensed.wav. The recording was made in a semi-anechoic room
with a Rode NT1-A mic. No big deal here. I took a 6-word sample of the longer
word list and cut out time between words. Zero-crossing detection was used to
eliminate pops as I deleted sections of silence between words. The resulting
file is labeled janice_sample_condensed.wav.

Next, the monaural wav file (speech) was "auralized" using the four
B-formatted, 96 kHz, 24-bit IR files obtained via a Soundfield Ambisonic mic.
The four IRs (w, x, y, and z) were applied to the monaural dry recording
(janice_sample_condensed). Finally, I used a popular VST to convert the
resulting four B-formatted files to a stereo/binaural file (KEMAR or similar
HRTF). The stereo file is titled janice_60x00y.wav. The 60x00y comes from the
position of the mic relative to the loudspeaker.

Now for the weird stuff: When you listen to the janice_60x00y.wav file under
headphones (it's a binaural recording), it's fairly clear that the talker is to
the right of the listener. This would be expected based on the mic/speaker
orientation. The first word is the easiest to localize, and one could argue the
precedence/Haas effect helps localize the first sound in the reverberant room.
As the sentence progresses, the localization is more blurred (at least to me).
So, to investigate whether other words could be well localized by starting at
each word's onset, I moved the wav file editor's cursor to begin at around 4
seconds. What I noticed was a distinct impulsive/gunshot sound--it isn't
remotely subtle. This "burst" has nothing to do with non-zero crossing point
pops or the abrupt start/stop of a waveform without fading in/out of it. This
occurs at any number of locations, but is particularly noticeable around 3.8
seconds. But when you listen to the
wav file from start to finish, no such sound exists. I also trimmed off the
wav file's first four seconds and provided 50 ms fade-in. The impulse is still
clearly audible. But yet, it goes completely unnoticed when listening to the
full-length file from its beginning.

Because the four IR files are 2 s duration, I thought there might be a "ripple"
that occurs every two seconds. So, to test this, I created a 600 ms noise burst
from ANSI speech-weighted noise (600 ms is approximately the time taken to say
Tom). I added a 10 s tail of silence to the noise burst. Next I proceeded to
apply the IR files using the same settings (e.g. 100 percent wet) as I did with
the dry-speech recording. There are no "ripples" of impulse noise in the silent
region. I then cropped off a small initial portion of the noise burst and
applied a fade in. The impulsive sound is very evident, but doesn't occur when
listening to the file from its beginning (i.e., the original, full-length
file). The speech noise files are speech_noise_600ms.wav ("dry" noise);
speech_noise_hrtf_1.wav (same processing as dry speech stereo); and
speech_noise_hrtf_cropped.wav (fade-in added to the trimmed file).

Artifacts such as this make me question a lot of what's going on
research-wise. I don't know how hearing-impaired persons hear or deal with echo
suppression and artifacts, so these "ghosts" could present a very real problem.
Although we might not hear the artifact in one condition
(i.e., playing from beginning), there's still something going on behind
the scenes.

This kind-of reminded me of "Ghosties and Ghoulies" found on the Harvard Tapes
psychoacoustic demos (briefly, this demo shows how the brain suppresses echos:
When the hammer blow is played backwards, the decay is quite audible, when
played forward, it's a brief sound).

Please listen to the files yourself. Your insight is most welcome.

Back to work (and a lot of coffee).
Best always,
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20121214/078fb22a/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

[Sursound] Ghost in Machine

Reply via email to