There is a classic (among geeks at least) scene in Star Trek IV where the crew of the Enterprise has traveled back in time to the late 20th century. Chief Engineer Scott sits down in front of a Mac computer and says, "Computer. Computer?" Getting no response Dr. McCoy helpfully hands Scott the mouse, which he holds like microphone, "Hello, Computer??" The befuddled 1980's Earthling standing by finally tells him to just use the keyboard ("How quaint"). [Watch on YouTube] While the Mac was not able to respond to voice input, it did bring the mouse-driven graphical user interface revolution to the masses.
Wednesday, January 9, 2008
Moreover, when Steve Jobs introduced the Macintosh over 20 years ago his demonstration, in part, "let the computer do the talking" via a synthesized voice reading a short speech. This was quite a feat in 1984. However, very little has changed in speech synthesis between then and now. Why has there been so little advancement in voice-based computer interfaces in over two decades? Are the factors finally in place for the next interface revolution to truly put the Personal in PC's. The answer may be yes, and the company poised to lead that change is once again Apple.
The answers to the first question are many. Primarily, voice-based interfaces have stagnated not due to technology constraints but because of a lack of demand. The niche market has been served by software venders like Dragon Systems (now owned by Nuance) who have been able to do voice recognition since the days when the 486 processor ruled. Current iterations showing the feasibility of voice recognition include voice dialing, available on many, even the most inexpensive, cell phones, and the Sync system for making phone calls and controlling digital music players in Ford cars. The lack of demand on desktop systems in the past is largely due to the fact that the majority of computer use took place in the cubicle farms of the American office. Voice interfaces would not fit very effectively into that environment.
As more people spend more time online at home, speech-based interactions make more sense. In addition, many people compose numerous emails, and blog, chat, or Twitter daily. All of these applications would be well served by dictation software. Further, the generation of people having grown up with computers continues to grow. While older people, as a general rule, may be less comfortable with technology, kids and young adults have no aversion to talking to a machine. Perhaps the time is finally right for someone to take this seemingly logical next step in computer interfaces.
If the time is now the company may be Apple. Buoyed by an amazing chain of products since Steve Jobs regained the helm, Apple has shown a repeated ability to take existing technologies and polish and package them in a user-friendly way that brings them to more people. The iPod and iTunes have done it for digital music, OS X for Unix, and now the iPhone/iPod Touch for mobile computing. History has shown Apple to have an interest in improving the user experience. Another major advantage is control of the hardware and software environment and a commitment to open source. Apple has long included built-in microphones on its laptops and all-in-ones. Tweaking these for noise reduction or other speech enhancements would be fairly easy. If they set their engineers to the task, speech could become an intrinsic part of Mac OS.
This is the key to my argument. I don't purport to have done an exhaustive review of the available add-ons that can make a computer voice-activated. Far from it. But that is because this technology should not be an add-on. If I can edit a photo, listen to digital music, browse the internet, and write formatted text using a stock installation of an operating system, I should just as easily be able to search for a file, cue up a song, navigate to a web site, or dictate text without using the keyboard. I'm not saying that the computer should understand complex natural language or that the mouse and keyboard would be replaced entirely. I would be happy to follow a set format for commands and annunciate clearly and separate each word from the others.
Mac OS X even includes some support out of the box for voice recognition and computer speech (my computer tells me the time every half hour). The problem is these features are not highlighted as the way to interact with the computer. Until there is a keynote where Steve Jobs uses spoken commands in a demonstration or there is an Apple ad campaign that shows users talking to their computers, consumers (and therefore developers) won't take speech seriously. But if it were suddenly put forward as part of human interface guidelines a whole new breed of more usable applications could take hold, and the next generation of computer interface could develop. If only Apple is listening.