Voice interfaces

Current voice interfaces are horrible. They try to imitate intelligence and fail so badly that they make computers even more frustrating to use. Most people have no idea how to use things like Siri. They ask it perfectly reasonable questions like, “Who invented the lightbulb?” and it responds with junk from Google. The answer to that particular question is not straightforward–at least three people, independently, invented different types of light bulbs–and computers are really bad at deciphering and communicating absolute answers from complex information like that. Because they are still so frustratingly limited, Siri and Google Now are simply not yet ready to exist. They are not artificially intelligent by any stretch of the imagination.

That being said, current voice recognition technology is incredibly good at certain things. It’s great at detecting and transcribing words, listening for specific commands, and making matches against expected inputs. So why does literally no software take advantage of voice technology in the way it works best? For example it boggles my mind that I cannot do the following things:

All of these possible cases have one important thing in common–something specific has to happen before the voice control works. I have to tap a dropdown and then say “California”. I have to have just received a text message to say “ignore” and it have it disappear. The reason current voice interfaces suck is because they force the speaker to consciously enter a “voice” mode and then create context around the action they want the computer to perform. This makes no sense; the computer should just always be listening for potential commands within the context of whatever the user is doing. Siri has no context, which makes it very difficult for the computer to accurately predict what the user whats to do. It also makes it very difficult for a user to know when using Siri would be helpful. But within specific context, there are only a few possibilities, which makes it far easier to use and much easier for a developer to build.

In general, voice interfaces would work great for things that are easy to vocally describe but hard to do on input devices like touch screens, keyboards, and mice. Like selecting “California” from a dropdown list. Or paging through icons to find an app to open. I only listed five possible use cases above, but I think they would dramatically improve the computing experience. Imagine what other things could be done.

 
1,340
Kudos
 
1,340
Kudos

Now read this

Codename: Svbtle

A couple of months ago, after being irritated by the complexity and uninspiring nature of most blogging platforms, I decided to build my own solution to power dcurt.is. It is codenamed Svbtle [1]. The first interface I built just... Continue →