I am a bit unsure of what you are trying to accomplish. Are you trying to recognize ANY voice (someone said "hello") or are you trying to differentiate between voices (Mary is the person talking)? If it is the first, I think you would have better luck with calling a commercial package.
If it is the differentiation, you can have a message for them to say and then compare the waveforms that are spoken. I think you would want to compare the envelopes of the waveforms in some way. Maybe using spectral measurements to determine the main frequency components of the entered waveform.
