Showing posts with label user interfaces. Show all posts
Showing posts with label user interfaces. Show all posts

Friday, May 16, 2008

Mac OS X Speech Synthesis

Since the introduction of the Macintosh in 1984 Mac OS has had the ability to convert text into speech. Even eight-bit computers like the Commodore 64 had SAM, an early voice synthesizer, but as I bemoaned several months ago, there has been relatively little progress in speech recognition and synthesis in the intervening decades. For the more than 45 million Americans with literacy problems this is especially important. Despite the lack of exceptional progress, OS X does offer options for text-to-speech that may be of interest to users regardless of their literacy level.

Here are some uses for speech synthesis that you may not have thought of. Anyone who writes, even if it's only an occasional professional email, can benefit from text-to-speech. While spell checkers are great for finding egregious errors, more subtle problems are harder to spot. Often writers inadvertently use the wrong word or add extra words to their text. For example, how often have you seen "you" in place of "your" accidently? One easy way to find these problems is to listen to someone read what you wrote. OS X can do that for you.

Similarly to the Dictionary application, speech synthesis has been integrated into the modern Mac operating system. Any highlighted text whether it be in a web browser or an e-mail, can be read aloud by the computer. In many applications like word processors the user just needs to bring up the context menu by right clicking or control clicking and choose the "Speech" option, and "Start speaking". If the option is not in the context menu it is still available in the Services menu. Click on the name of the application in the menu bar and then go to "Services/Speech/Start speaking". It is also possible to create a shortcut key for this option. Simply go to System Preferences and open the Speech preference pane. In the "Text to speech" tab, check "Speak selected text when the key is pressed" and then push the "Set key" button. Now just highlight text in any application, and your computer will read it to you at the touch of a button.

Another speech feature can be useful to many people. When working on the computer it's easy to lose track of time. Sometimes hours go by before I realize it. To avoid this, OS X can announce the time for you. The option is available in the Date and Time settings. These can be accessed in several ways. There is a button in the aforementioned "Text to speech" pane, or you may click on the time in the menu bar and choose the "Open Date & Time..." option. Date and Time is also a choice from the main System Preferences menu. Once there, simply click "Announce the time", in the Clock tab, choose how often, and click "Customized voice" if you wish to set specific voice options.

Some users like me, who keep their Dock hidden, may not always notice applications bouncing their icons in the Dock when they need attention. This can be addressed by having OS X speak to you when a program needs attention. This option is also in the "Text to speech" tab of the Speech System Preferences. Just check "Announce when an application requires your attention". The computer is even very polite, saying, "Excuse me. Application X needs your attention."

What if you are dissatisfied with the standard computer voices? Without doing an exhaustive search I found two companies that offer commercial voice packs for OS X. Both have fairly realistic voices. You can hear many samples or download demos at the InfoVox and Cepstral web sites. Unfortunately, they're rather pricey. The InfoVox voices are $100 for the American English pack, whereas Cepstral voices are sold individually for $29 each.

While it would be hard to say that speech synthesis has come a long way on the Mac, the availability of universally integrated speech options and high-quality commercial voices does make a compelling combination. For those who prefer to have text read to them or just simple system alerts, text-to-speech can be a useful and important component of the operating system.

For more great information on the Services menu, see this web site.

Sunday, April 13, 2008

Why-Mac Part One: Window Management

Apple stock compared to the Nasdaq and Dow Jones.

Until recently, there were no real contenders to Microsoft's OS monopoly. Since the release of OS X and the iPod, however, Apple has steadily begun to challenge that dominance. Apple has over 19 billion dollars in cash stashed away. Their stock price, despite recent declines due to economic fears, has increased over 350% since 2005. Studies have shown 40% of incoming freshmen at some universities using Macs, and Apple has garnered a 25% market share by revenue for laptops sold by all manufacturers for February 2008.

Why-Mac will be a series of articles explaining in detail how I have found Mac OS X to be the best in usability, productivity, and aesthetics. Much has been written about switching to Mac or intricately tweaking OS X, but most of this information is either very basic or too technical. These articles will span the middle ground. For readers who are familiar with computer usage and MS Windows, recent switchers or those considering a Mac, it will present details about how Macs are different and how those differences can make you more productive. Hopefully even longtime Mac users will find some tips and tricks and come to understand their computer better.

First, a bit of background on what qualifies me to be writing these articles. I started using personal computers at the age of 11 on a Texas Instruments 99 4/A. My parents wouldn't buy any game cartridges for it, so my brother and I learned to program in Basic. Later, I became a fan of Atari computers. The Atari ST used the GEM interface, which was a knock-off of the Macintosh OS, but it offered more "Power Without the Price". In high school, the local newspaper published a letter to the editor in which I argued against the purchase of Macs for our school (infuriating our computer teacher). After high school, I worked at a couple of PC clone stores, selling, building, and repairing computers. I learned the workings of DOS and Windows. The promises of Microsoft for each revision of Windows would excite and then disappoint me. In 1995, I became an internet programmer and later learned Java. My experience with Macs began shortly after OS X was released. Having tinkered with Linux off and on for years, the stability of Unix coupled with a nice user interface appealed to me. I got my first Mac in 2001, spent a couple months learning OS 9.2 in order to understand some history, then plunged into OS X and never looked back. While I don't like to consider myself a "fanboy", as my friend said on the matter, "There is no fervor like that of the converted." Without further ado, here then is part one of Why-Mac.

One of the primary differences between Windows and OS X that is often overlooked is the basic way applications are run and windows handled. The Unix world uses the concept of a window manager. It decides how to arrange and display the individual windows of running applications. Though MS Windows and OS X lack a true window manager program, for ease of discussion I will nonetheless use this terminology.

The OS X window manager offers many usability and productivity advantages over Windows. As most anyone who has used a PC and a Mac knows, the running application in OS X displays its menu options, File, Edit, et cetera, at the very top of the screen. Windows on the other hand, puts these options within the window of the program. Ergonomics experts talk about Fitts's Law, which calculates the amount of time for a desired target to be accessed when doing something like moving a mouse. It has been shown that having these common options on a border makes them easier and faster to access.
Safari windows revealed by Exposé.
The next OS X feature that is often overlooked is how multiple documents within one program are handled. Unlike Windows, Mac OS distinguishes between an application and its separate documents. This enables several advantageous usage scenarios. Take the Safari web browser, for example. If several separate windows are opened, they can be quickly switched between by using Command and ~, the tilde key, (i.e. Apple-~). To view the open windows graphically, press F-10 to activate what Apple calls Expose, which also gives the ability to click on a desired document. If you want to switch to a different program altogether, say going to iTunes to change playlists, pressing and holding Command-Tab shows the current apps. Sensibly, they are shown only once, not once for each open document. Similarly, the Dock shows running applications, not their individual windows.
Alt-Tab reveals running applications.
There is even more granularity available, though. Minimizing a document by pressing the yellow minus sign removes it from this internal list, so it no longer appears in Exposé or when switching with Command-~. This is useful, for example, when there is a website I want to read but not right at the moment. A tiny screenshot of the minimized window appears in the Dock, complete with the icon from its parent application to make distinguishing it easier.
Safari windows minimized in the Dock.
OS X has also retained the Macintosh feature of hiding an application. Pressing Command-H makes a program hide. Its minimized windows are removed from the Dock (though the program's icon remains), and Exposé no longer shows any of its documents. The program can be unhidden by selecting it with Alt-Tab or clicking on the Dock icon.

The differentiation between windows and applications provides still more benefits. Pressing Command-W on a Mac will consistently close only the current document window. Pressing Command-Q will quit the entire application and close all of its documents. In MS Windows it tends to be a crap shoot whether Alt-F4 (the shortcut for closing a window) will exit just that document or the entire program. In addition, an option available only in OS X is running a program with no open documents. At first this seems nonsensical and confusing. If you close all a program's documents, it remains running with its menu bar at the top of the screen but nothing below. An obvious use for this functionality is loading a program like Photoshop and leaving it run even when no images are currently being edited. Photoshop has many plug-ins and takes a long time to load. Being able to leave it open in this way is a real productivity boost.

The newest OS X, Leopard's window manager also gives the option of placing programs on various virtual desktops. This feature is called Spaces. It provides a simple way to segregate your work into separate domains; a further option that eliminates the clutter of running many applications and makes accessing information faster and easier.

The final area of window management in which OS X excels is maximizing windows. In the Microsoft world, maximizing a window means making it take up the entire screen regardless of how much information it actually presents. In most OS X applications the documents are smart enough to resize only as much as needed. For example, when zooming in and out on images in Photoshop, a maximized image window will fit the size of the image on screen as long as there is available real estate and not cover additional space with a blank window.

This concludes part one of my Why-Mac series. Understanding window management is key to maximizing productive computer use. Mac OS X facilitates efficiency by providing the aforementioned means of organizing, viewing, and switching between applications. The rest of this series will look at more ways Macs enable a more pleasant and productive computing experience.

Thursday, January 24, 2008

Before Touch Screens, Multitouch Mice

As much as I would like for there to be a sub-$1000, tablet-like, touch screen Mac the economics of it just don't work yet. The company Axiotron previewed its Modbook over a year ago and just started shipping them (supposedly). Still, the price of $2,300-2,500 is prohibitive. A Wacom 12.1" touch LCD runs a grand and weighs over four pounds. Unfortunately, I don't think there is enough magic at Apple Labs to deliver the product I crave; however, an intermediate step may be entirely plausible and could ship soon. Imagine grafting together a slightly rounder, flatter Mighty Mouse, a MacBook Air trackpad, and the guts of a Wii controller.

The ideal device that I envision is decidedly a bit ambitious and futuristic, but there are variations on the theme that keep it more practical. First, imagine an iMac G3 "puck mouse" (shudder) without the cord or button. Overlay on this surface the multi-touch, gesture sensitive trackpad that debuted recently on the Air. For just moving the cursor around it is much more convenient to have something physically moving than trying to rub a trackpad just the right way. That is where the mouse nature comes into play. Due to its roundness, it would be convenient if the mouse were inertially sensitive rather than relying on optical movement over a surface. That is where the Wii-like internals would be used. The orientation wouldn't affect the direction of cursor movement. You could move it around without worrying about the direction it is facing, avoiding the annoying problem when the puck mouse would turn. Eventually, this could lead to hand-held devices being moved in 3d space though at that point gestures would have to be handled differently.

For the current iteration, however, the surface of the mouse would register taps (mimicking the behavior of standard mouse buttons) but would also allow the use of iPhone gestures- swiping side to side, pinching and expanding, or rotating. Since these gestures are based more on what is currently selected than the mouse position, it makes sense for that sensitivity to be layered on top of the means of moving the cursor rather than coupled with it.

If an inertially sensitive, orientation-independent version is too ambitious for now, it would be equally plausible to base the design on a slightly flattened Mighty Mouse rather than the puck mouse. This would maintain the standard mouse directionality, and the device could come with a cord or wireless. It would also eliminate the need for the hardware and software to handle Wii-like position sensing. The basic idea of overlaying the gesture sensitivity would be the same.

It may look a little clunky, but the multitouch mouse would provide a new level of interactivity to the Mac interface. It would also leverage the work done on the iPhone and Touch interface and get users used to the "standard" Apple gestures. Until we can get fully touch sensitive notebook or tablet screens, the multitouch mouse would be a welcome step forward.

Wednesday, January 9, 2008

Hello, Computer?? Apple can you hear me?

There is a classic (among geeks at least) scene in Star Trek IV where the crew of the Enterprise has traveled back in time to the late 20th century. Chief Engineer Scott sits down in front of a Mac computer and says, "Computer. Computer?" Getting no response Dr. McCoy helpfully hands Scott the mouse, which he holds like microphone, "Hello, Computer??" The befuddled 1980's Earthling standing by finally tells him to just use the keyboard ("How quaint"). [Watch on YouTube] While the Mac was not able to respond to voice input, it did bring the mouse-driven graphical user interface revolution to the masses.


Moreover, when Steve Jobs introduced the Macintosh over 20 years ago his demonstration, in part, "let the computer do the talking" via a synthesized voice reading a short speech. This was quite a feat in 1984. However, very little has changed in speech synthesis between then and now. Why has there been so little advancement in voice-based computer interfaces in over two decades? Are the factors finally in place for the next interface revolution to truly put the Personal in PC's. The answer may be yes, and the company poised to lead that change is once again Apple.

The answers to the first question are many. Primarily, voice-based interfaces have stagnated not due to technology constraints but because of a lack of demand. The niche market has been served by software venders like Dragon Systems (now owned by Nuance) who have been able to do voice recognition since the days when the 486 processor ruled. Current iterations showing the feasibility of voice recognition include voice dialing, available on many, even the most inexpensive, cell phones, and the Sync system for making phone calls and controlling digital music players in Ford cars. The lack of demand on desktop systems in the past is largely due to the fact that the majority of computer use took place in the cubicle farms of the American office. Voice interfaces would not fit very effectively into that environment.

As more people spend more time online at home, speech-based interactions make more sense. In addition, many people compose numerous emails, and blog, chat, or Twitter daily. All of these applications would be well served by dictation software. Further, the generation of people having grown up with computers continues to grow. While older people, as a general rule, may be less comfortable with technology, kids and young adults have no aversion to talking to a machine. Perhaps the time is finally right for someone to take this seemingly logical next step in computer interfaces.

If the time is now the company may be Apple. Buoyed by an amazing chain of products since Steve Jobs regained the helm, Apple has shown a repeated ability to take existing technologies and polish and package them in a user-friendly way that brings them to more people. The iPod and iTunes have done it for digital music, OS X for Unix, and now the iPhone/iPod Touch for mobile computing. History has shown Apple to have an interest in improving the user experience. Another major advantage is control of the hardware and software environment and a commitment to open source. Apple has long included built-in microphones on its laptops and all-in-ones. Tweaking these for noise reduction or other speech enhancements would be fairly easy. If they set their engineers to the task, speech could become an intrinsic part of Mac OS.

This is the key to my argument. I don't purport to have done an exhaustive review of the available add-ons that can make a computer voice-activated. Far from it. But that is because this technology should not be an add-on. If I can edit a photo, listen to digital music, browse the internet, and write formatted text using a stock installation of an operating system, I should just as easily be able to search for a file, cue up a song, navigate to a web site, or dictate text without using the keyboard. I'm not saying that the computer should understand complex natural language or that the mouse and keyboard would be replaced entirely. I would be happy to follow a set format for commands and annunciate clearly and separate each word from the others.

Mac OS X even includes some support out of the box for voice recognition and computer speech (my computer tells me the time every half hour). The problem is these features are not highlighted as the way to interact with the computer. Until there is a keynote where Steve Jobs uses spoken commands in a demonstration or there is an Apple ad campaign that shows users talking to their computers, consumers (and therefore developers) won't take speech seriously. But if it were suddenly put forward as part of human interface guidelines a whole new breed of more usable applications could take hold, and the next generation of computer interface could develop. If only Apple is listening.

Saturday, January 5, 2008

Nintendo Wii: Good but Not Too Good

Sales reports have consistently shown the Nintendo Wii to be leading the pack when it comes to current generation game consoles. Back when the Wii was just the conceptual "Revolution" I predicted and hoped that it would indeed revolutionize gaming with its user interface innovations. The Wii has been successful because it is good but not too good.

The most obvious interpretation of this statement involves the price/performance trade offs that console manufacturers face. While Sony and Microsoft chose to continue escalating the technical specifications of their hardware Nintendo took a middle ground approach. The Wii does HD but not 1080 resolution. It has a DVD drive but doesn't play movies (let alone Blu-ray or HD DVD). In all respects the system has less power than the competition, but by choosing lower hardware requirements Nintendo was able to deliver a more affordable, smaller console.

A less apparent application of good but not too good is an aspect of human nature that I believe will foretell near term advances in virtual reality (VR). On the commentary for one of the early CGI movies (it may have been Shrek, but I don't recall) the animators talk about a phenomenon whereby people started to dislike the characters if they became too close to real. It seems the human mind is happy to place itself in a state of suspended disbelief when what it is experiencing is clearly unbelievable. We don't watch a Roadrunner cartoon and complain that there is no way the coyote could survive that fall. The problem for movie makers occurred as animated characters started approaching reality. At that point people would look at them and know that something was "not right" but not necessarily be able to put their finger on it. The computer graphics had passed the threshold of being obviously fake but had not yet reached the point of being believable. They were too good for their own good and actually had to be made less realistic.

The same logic can be used with virtual reality and the Wii. Nobody would claim that waving around a remote control truly gives you the same experience as swinging a tennis racket at a ball or slicing a goblin with a sword. Yesterday I was reading about haptic interfaces. The Webopedia article states, "For example, in a virtual reality environment, a user can pick up a virtual tennis ball using a data glove. The computer senses the movement and moves the virtual ball on the display. However, because of the nature of a haptic interface, the user will feel the tennis ball in his hand through tactile sensations that the computer sends through the data glove, mimicking the feel of the tennis ball in the user's hand." This is certainly far above what the Wii's controller offers. Will this be the next generation of gaming? I don't think so, and the reason is that it defies the good but not too good philosophy. When games start to mimic tactile sensations it butts up against the "close to reality but just not right" barrier. I'm sure such a device would be interesting to try, but in order to lose ourselves in the experience of a game, just like with a movie, we either need to be in a clearly non-real environment or so totally immersed that it is difficult to distinguish what is and is not real.

It's been over a decade since Pixar introduced us to full length CGI animation with Toy Story. Movies are just now approaching the use of fully realistic human characters. While VR has also been in development for decades, the Wii gaming console is definitely the largest real world application of virtual reality concepts. Before the next level of immersive VR is achieved the industry will have to overcome the problem of being too close to reality without being close enough. In my opinion this will likely take the next ten years. In the meantime there is plenty of opportunity using the current technology for unbelievable games to be incredibly fun.