Greetings to All,

I've been working on listening samples to help explain my "ideas" regarding 
hearing aid and cochlear implant research to others. For starters, I'm using 
IRs obtained with a Soundfield mic to auralize dry speech. Unfortunately, more 
questions than sounds surround me weary head.

I discovered an artifact that will need to be addressed, and the answer may be 
obvious to the experts out there. I have uploaded files so that everyone can 
here the artifacts. The files can be downloaded from www.elcaudio.com/demos/


The dry recording I had initially planned as a demo is titled 
janice_sample_condensed.wav. The recording was made in a semi-anechoic room 
with a Rode NT1-A mic. No big deal here. I took a 6-word sample of the longer 
word list and cut out time between words. Zero-crossing detection was used to 
eliminate pops as I deleted sections of silence between words. The resulting 
file is labeled janice_sample_condensed.wav.

Next, the monaural wav file (speech) was "auralized" using the four 
B-formatted, 96 kHz, 24-bit IR files obtained via a Soundfield Ambisonic mic. 
The four IRs (w, x, y, and z) were applied to the monaural dry recording 
(janice_sample_condensed). Finally, I used a popular VST to convert the 
resulting four B-formatted files to a stereo/binaural file (KEMAR or similar 
HRTF). The stereo file is titled janice_60x00y.wav. The 60x00y comes from the 
position of the mic relative to the loudspeaker.

Now for the weird stuff: When you listen to the janice_60x00y.wav file under 
headphones (it's a binaural recording), it's fairly clear that the talker is to 
the right of the listener. This would be expected based on the mic/speaker 
orientation. The first word is the easiest to localize, and one could argue the 
precedence/Haas effect helps localize the first sound in the reverberant room. 
As the sentence progresses, the localization is more blurred (at least to me). 
So, to investigate whether other words could be well localized by starting at 
each word's onset, I moved the wav file editor's cursor to begin at around 4 
seconds. What I noticed was a distinct impulsive/gunshot sound--it isn't 
remotely subtle. This "burst" has nothing to do with non-zero crossing point 
pops or the abrupt start/stop of a waveform without fading in/out of it. This 
occurs at any number of locations, but is particularly noticeable around 3.8 
seconds. But when you listen to the
 wav file from start to finish, no such sound exists. I also trimmed off the 
wav file's first four seconds and provided 50 ms fade-in. The impulse is still 
clearly audible. But yet, it goes completely unnoticed when listening to the 
full-length file from its beginning.

Because the four IR files are 2 s duration, I thought there might be a "ripple" 
that occurs every two seconds. So, to test this, I created a 600 ms noise burst 
from ANSI speech-weighted noise (600 ms is approximately the time taken to say 
Tom). I added a 10 s tail of silence to the noise burst. Next I proceeded to 
apply the IR files using the same settings (e.g. 100 percent wet) as I did with 
the dry-speech recording. There are no "ripples" of impulse noise in the silent 
region. I then cropped off a small initial portion of the noise burst and 
applied a fade in. The impulsive sound is very evident, but doesn't occur when 
listening to the file from its beginning (i.e., the original, full-length 
file). The speech noise files are speech_noise_600ms.wav ("dry" noise); 
speech_noise_hrtf_1.wav (same processing as dry speech stereo); and 
speech_noise_hrtf_cropped.wav (fade-in added to the trimmed file).


Artifacts such as this make me question a lot of what's going on 
research-wise. I don't know how hearing-impaired persons hear or deal with echo 
suppression and artifacts, so these "ghosts" could present a very real problem. 
Although we might not hear the artifact in one condition 
(i.e., playing from beginning), there's still something going on behind 
the scenes.

This kind-of reminded me of "Ghosties and Ghoulies" found on the Harvard Tapes 
psychoacoustic demos (briefly, this demo shows how the brain suppresses echos: 
When the hammer blow is played backwards, the decay is quite audible, when 
played forward, it's a brief sound).

Please listen to the files yourself. Your insight is most welcome.

Back to work (and a lot of coffee).
Best always,
Eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<https://mail.music.vt.edu/mailman/private/sursound/attachments/20121214/078fb22a/attachment.html>
_______________________________________________
Sursound mailing list
Sursound@music.vt.edu
https://mail.music.vt.edu/mailman/listinfo/sursound

Reply via email to