[Dailydave] Vista speach recognition

Michal Zalewski lcamtuf at dione.ids.pl
Thu Feb 1 13:43:04 EST 2007


On Thu, 1 Feb 2007, Juha-Matti Laurio wrote:

> http://blogs.technet.com/msrc/archive/2007/01/31/issue-regarding-windows-vista-speech-recognition.aspx

I find this kind of bogus. Voice recognition systems don't compare raw
waveforms. Most of the information is discarded: they usually isolate a
fraction of the signal, normalize it, chop it into discrete bits that best
reflect changes in voice modulation or whatnot, then feed it to HMM
analyzer or some other ANN. This is heavily optimized based on various
assumptions on how human speech sounds, and how ambient noises might look
like.

What this means is that it is in all likelihood possible to produce a
waveform that will be impossible to interpret for a human (either because
it is masked by a superimposed signal, or because it does not resemble
speech in the first place), but will be "heard" as meaningful words by
Vista.

So, you get an eerie industrial background music and noises on a website,
instead of a dude reading out loud "my documents, delete, yes".

Heck, this happens spontaneously: speech recognition systems sometimes
pick up random burps and crashes from the environment and map them to
dictionary words. And wasn't there an early demo for Vista speech
recognition that wasn't trained for that particular salesdude, and kept
hearing "dear aunt double the killer" instead of what he was saying? Oh
yeah:

http://video.google.com/videoplay?docid=-1123221217782777472

Now, I bet that MSRC dudes are well aware of this possibility, but chose
not to mention it. Eh.

/mz



More information about the Dailydave mailing list