Friday, May 16, 2008

Mac OS X Speech Synthesis

Since the introduction of the Macintosh in 1984 Mac OS has had the ability to convert text into speech. Even eight-bit computers like the Commodore 64 had SAM, an early voice synthesizer, but as I bemoaned several months ago, there has been relatively little progress in speech recognition and synthesis in the intervening decades. For the more than 45 million Americans with literacy problems this is especially important. Despite the lack of exceptional progress, OS X does offer options for text-to-speech that may be of interest to users regardless of their literacy level.

Here are some uses for speech synthesis that you may not have thought of. Anyone who writes, even if it's only an occasional professional email, can benefit from text-to-speech. While spell checkers are great for finding egregious errors, more subtle problems are harder to spot. Often writers inadvertently use the wrong word or add extra words to their text. For example, how often have you seen "you" in place of "your" accidently? One easy way to find these problems is to listen to someone read what you wrote. OS X can do that for you.

Similarly to the Dictionary application, speech synthesis has been integrated into the modern Mac operating system. Any highlighted text whether it be in a web browser or an e-mail, can be read aloud by the computer. In many applications like word processors the user just needs to bring up the context menu by right clicking or control clicking and choose the "Speech" option, and "Start speaking". If the option is not in the context menu it is still available in the Services menu. Click on the name of the application in the menu bar and then go to "Services/Speech/Start speaking". It is also possible to create a shortcut key for this option. Simply go to System Preferences and open the Speech preference pane. In the "Text to speech" tab, check "Speak selected text when the key is pressed" and then push the "Set key" button. Now just highlight text in any application, and your computer will read it to you at the touch of a button.

Another speech feature can be useful to many people. When working on the computer it's easy to lose track of time. Sometimes hours go by before I realize it. To avoid this, OS X can announce the time for you. The option is available in the Date and Time settings. These can be accessed in several ways. There is a button in the aforementioned "Text to speech" pane, or you may click on the time in the menu bar and choose the "Open Date & Time..." option. Date and Time is also a choice from the main System Preferences menu. Once there, simply click "Announce the time", in the Clock tab, choose how often, and click "Customized voice" if you wish to set specific voice options.

Some users like me, who keep their Dock hidden, may not always notice applications bouncing their icons in the Dock when they need attention. This can be addressed by having OS X speak to you when a program needs attention. This option is also in the "Text to speech" tab of the Speech System Preferences. Just check "Announce when an application requires your attention". The computer is even very polite, saying, "Excuse me. Application X needs your attention."

What if you are dissatisfied with the standard computer voices? Without doing an exhaustive search I found two companies that offer commercial voice packs for OS X. Both have fairly realistic voices. You can hear many samples or download demos at the InfoVox and Cepstral web sites. Unfortunately, they're rather pricey. The InfoVox voices are $100 for the American English pack, whereas Cepstral voices are sold individually for $29 each.

While it would be hard to say that speech synthesis has come a long way on the Mac, the availability of universally integrated speech options and high-quality commercial voices does make a compelling combination. For those who prefer to have text read to them or just simple system alerts, text-to-speech can be a useful and important component of the operating system.

For more great information on the Services menu, see this web site.

Thursday, May 15, 2008

Universal Access Options for Everyone


As with most operating systems of the last decade or so Mac OS X contains options for accessibility by people with disabilities. What users may not realize is that some of these options are extremely useful to anyone.

OS X puts these functions in the Universal Access pane of System Preferences. This article will concentrate on a few options in the Seeing and Keyboard tabs.

The first item that is certainly of use is the zoom feature. Pressing Command- Option 8 zooms in on the screen around the mouse cursor. By default graphics are smoothed after the zooming takes place, so images that would otherwise appear pixilated still look decent. I often use this feature when watching low-quality web video. Rather than putting up with a tiny postage stamp sized video I simply press the short cut key and watch it much closer to full screen. Once in zoom mode the magnification can be adjusted by pressing Command- Option-minus or Command- Option-equals.


The Keyboard tab has features designed for people who have difficulty typing. However, one of the options in Sticky Keys is very useful for people creating screen casts. With Sticky Keys open and the option "Display pressed keys on screen" checked, the symbols for modifier keys, command, option, control, or shift, appear on the screen when they're pressed. In tutorial situations and with new users this is useful to provide a visual cue to go along with the name of the key being used.

video
Finally, the option "Enable access for assistive devices" appears at the bottom of the Universal Access pane all the time. This choice needs to be selected in order for tools like text expanders to work. It allows applications to access the keyboard buffer as you are typing.

For people with no challenges using a computer the Universal Access pane may be the last place they would look to add useful functionality to OS X. As you can see, there are some options, however, that can improve the computing experience for anyone. Hopefully people will be inspired to explore further.

Wednesday, May 14, 2008

A Blog Meme Thought Virus

Many years ago my brother became interested in the concept of "thought viruses" as he called them. In high-school he tried to get everyone he knew to "whomp the zimbob" (a nonsense phrase that he made up). He is also fond of infecting people with tunes by whistling or humming a catchy song.

The idea of these thought viruses, however, goes back farther than my brother. The concept is called a meme and was developed by Richard Dawkins in a book written in 1970. Memes are very prevalent on the Internet, starting with simple e-mail forwards and now including vast social networks. One manifestation is the blog meme. I was recently "tagged" by Lon of NoLimits2Learning. Here are the rules that were outlined.

1. The rules of the game get posted at the beginning.
2. Each player answers the questions about themselves.
3. At the end of the post, the player then tags 5-6 people and posts their names, then goes to their blogs and leaves them a comment, letting them know they’ve been tagged and asking them to read the player's blog.
4. Let the person who tagged you know when you’ve posted your answer.


Here are the questions:

1. What were you doing 10 years ago?

Ten years ago I was working as a software developer at a printing company in Milwaukee, Wisconsin. I had dropped out of college a couple years before. My daughter was only three, and I tried to spend as much time as possible with her. I had not yet broken my neck, so I still had full control of my body. We enjoyed going to the park, riding bikes, and going on walks.


2. What are five things on my to-do list for today?

I don't keep a to do list, but today I am trying to catch up on my writing. I am using voice recognition for only the second time, so learning to do this better with also be on my list. This afternoon I have to pick up my nephew from school.

3. Snacks I enjoy...

I love chocolate, the darker the better. I also tend to snack on oatmeal or a banana since they are fast and easy.

4. Things I would do if I were a billionaire:

If I were a billionaire I would have a private jet. Traveling is such a hassle. I would also have a portable wheelchair that allows me to go up hills. I am not strong enough to push myself in my manual chair, but my power chair is not portable. I'd also have money to start a business. I'm not sure which idea I would pursue, but I would do one of them.

5. Three of my bad habits:

My worst bad habit is procrastination. I can also be hypercritical, and I'm not very friendly with new people.

6. Five places I have lived:

I have lived in the northwest suburbs of Milwaukee, Wisconsin, USA, on the east side of Milwaukee, on the west side of Milwaukee, on the northwest side of Milwaukee, and in Santa Barbara, California. Not much diversity.

7. Five jobs I have had:

I have worked as a pizza delivery guy, an archaeologist, a beetle dissector and drawer, a software developer, and an elementary special education teacher.

8. People I would like to know about because I am just that nosey:
I'm afraid I don't have five or six people, but here is my list: Jared Goralnick at Technotheory.com, Ricky Buchanon of ATMac, and Daniel Eran Dilger of Roughly Drafted Magazine.

Tuesday, May 13, 2008

Recognition for Speech Recognition

My friends came through again. If you've been following this blog you may remember reading about my attempts to get dictation software installed and running. I ended up creating a virtual Virtual PC. Unfortunately, the package with the IBM ViaVoice did not contain the CD. However, yesterday I received a disc in the mail from my friend; ViaVoice version 10 for Windows XP.

This is the first article that I am attempting to dictate. So far the recognition has been so-so. Correcting mistakes is not intuitive yet. I'm sure it will get better as I get used to the software. When it works the speed is amazing, so I am looking forward to getting this working better.

In order to make this somewhat more technical I am going to describe the set up that I am using in the order to get my dictated text onto the Mac.

ViaVoice's SpeakPad dictation application.

As I have it set up, ViaVoice allows dictation into its own application called SpeakPad. I am using this program to create a simple plain text file. In order for my Mac to access it, I set up Windows file sharing. Because I am using the Powerbook as a wireless gateway for the PC, it was somewhat harder to get the two computers to see each other. I was able to see the Windows share from my Mac but not vice versa.

The simple green screen interface of WriteRoom.

Aside from that issue setting up sharing was fairly straightforward. I followed the directions here. Then, to finally published the article I am using several Mac applications. First, I save the file from SpeakPad into the shared directory. I use the application, WriteRoom, to open that text file in order to proofread. I like the green and black full screen mode for this because it is easy on my eyes and typing with text expansion is responsive. Plus, once a file is open in WriteRoom, it will reopen it the next time the program is run. I simply save over the same text file in the virtual PC, and the Mac text editor reflects those changes. Next, I drag the text file into MacJournal in order to keep a nice, searchable copy. From there it's one click to publish. I open the blog in a browser where I add images and links and do the final check.

It seems that voice recognition will be a real time saver if I manage to tame it. Obviously the current setup is somewhat convoluted, but moving the information from app to app is actually fairly painless. And the much smaller amount of typing certainly causes me less pain. It's nice to have good friends, and it's especially good to have nice friends.