There are also the concepts of Cepstrum 
(https://en.wikipedia.org/wiki/Cepstrum) and Quefrency, which are derivatives 
of Spectrum and Frequency, with which you can even do speaker-recognition, but 
also detection of events.

Lars Liedtke
Lead Developer

[Tel.]  +49 721 98993-
[Fax]   +49 721 98993-
[E-Mail]        l...@solute.de<mailto:l...@solute.de>


solute GmbH
Zeppelinstraße 15
76185 Karlsruhe
Germany

[Marken]

Geschäftsführer | Managing Director: Dr. Thilo Gans, Bernd Vermaaten
Webseite | www.solute.de <http://www.solute.de/>
Sitz | Registered Office: Karlsruhe
Registergericht | Register Court: Amtsgericht Mannheim
Registernummer | Register No.: HRB 748044
USt-ID | VAT ID: DE234663798



Informationen zum Datenschutz | Information about privacy policy
https://www.solute.de/ger/datenschutz/grundsaetze-der-datenverarbeitung.php




Am 26.10.24 um 18:07 schrieb Thomas Passin via Python-list:
On 10/25/2024 12:25 PM, marc nicole via Python-list wrote:
Hello Python fellows,

I hope this question is not very far from the main topic of this list, but
I have a hard time finding a way to check whether audio data samples are
containing empty noise or actual significant voice/noise.

I am using PyAudio to collect the sound through my PC mic as follows:

FRAMES_PER_BUFFER = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 48000
RECORD_SECONDS = 2import pyaudio
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT,
                channels=CHANNELS,
                rate=RATE,
                input=True,
                frames_per_buffer=FRAMES_PER_BUFFER,
                input_device_index=2)
data = stream.read(FRAMES_PER_BUFFER)


I want to know whether or not data contains voice signals or empty sound,
To note that the variable always contains bytes (empty or sound) if I print
it.

Is there an straightforward "easy way" to check whether data is filled with
empty noise or that somebody has made noise/spoke?

It's not always so easy.  The Fast Fourier Transform will be your friend. The 
most straightforward way would be to do an autocorrelation on the recorded 
interval, possibly with some pre-filtering to enhance the typical vocal 
frequency range.  If the data is only noise, the autocorrelation will show a 
large signal at point 0 and only small, obviously noisy numbers everywhere 
else. There are practical aspects that make things less clear.  For example, 
voices tend to be spiky and erratic so you need to use small intervals to have 
a better chance of getting an interval with a good S/N ratio, but small 
intervals will have a lower signal to noise ratio.

Human speech is produced with various statistical regularities and these can 
sometimes be detected with various means, including the autocorrelation.

You also will need to test-record your entire signal chain because it might be 
producing artifacts that could fool some tests.  And background sounds could 
fool some tests as well.

Here are some Python libraries that could be very helpful:

librosa (I have not worked with this but it sounds right on target);
scipy.signal (I have used scypi but not specifically scipy.signal);
python-speech-features (another I haven't used);
   https://python-speech-features.readthedocs.io/en/latest/

Other people will know of others.
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to