Talk is cheap

Simon Says brings affordable voice recognition to the NeXT

by Lee Sherman

Visionaries have long looked forward to the day when we would engage in eloquent discourse with our computers. Researchers in the field of natural-language recognition, however, know better: Figuring out what a person is saying is hard work; for a computer, the task is nearly impossible.

But faster computers are rapidly making things that were "nearly impossible" a decade ago commonplace today. Already a variety of NeXT applications can recognize the human voice, thanks to the Visus SpeechKit developed at Carnegie Mellon University. But, alas, the SpeechKit requires that developers link the kit into their application and follow a special application programming interface in order to integrate speech recognition into their software. It won't work with shrinkwrapped, off-the-shelf apps.

Now HSD Microcomputer U.S. has brought us Simon Says, a program that offers speech recognition for the masses. Although it is speaker dependent, meaning that each user must train Simon to recognize his or her own words, Simon works with any NeXTstep application program as long as the only thing you want to do is click menu options, type text, and put recorded strings onto the pasteboard. Simon won't take dictation, and it won't fetch last quarter's sales results unless you specifically say where they are. But there are many tasks at which it excels.

Sound system
Simon Says ships with predefined word lists for the bundled NeXT applications (like Mail and Digital Webster's) as well as for Appsoft's WriteNow. The first time you launch the program, Simon starts training. For each command, Simon flashes the word on the screen and you repeat it three times. Be careful to speak the word the same way each time, since Simon is learning to recognize the actual sound of your voice rather than the words you speak. But don't fret: If you are inconsistent in your pronunciation, Simon will ask you to repeat yourself.

Simon improves its accuracy by grouping the commands it recognizes by application. Therefore, it has a relatively short list of choices to make for any word you say; this shortens the number of choices for each word. It takes only a few minutes to train the word list for a given application, but you'll probably want to go back and retrain certain words for improved accuracy as you get the hang of the program. Training takes place in a special window; a browser along the top of the window shows you which commands have been assigned to each application. When you're finished training you'll be able to execute any menu item that has a keyboard equivalent simply by speaking it.

Simon sits in the Application Dock, where he listens attentively until he hears a sound that he recognizes as a word. After comparing it to the list, he executes the closest-sounding command. Simon will smile, frown, or look puzzled depending on how well he recognizes your words. You can temporarily suspend Simon Says by telling Simon to cover his ears. This is useful if you want to carry on a conversation with someone who has entered your office without confusing the program.

Simon Says will only respond to the voice of the person who trained it. This is important in crowded offices, where several co-workers could be talking to their machines at the same time. In especially noisy offices, you'll want to use a headset.

On the other hand, Simon's speaker-dependent recognition is language independent, meaning it is not limited to recognizing only English speech. This is a big plus, considering NeXT's emerging world market.

In addition to menu commands, Simon Says can type predefined strings on your keyboard (voice macros!), run UNIX shell scripts, switch applications, and more all chosen through a series of intuitive inspectors. Simon can even be programmed to play sounds in response to voice commands: Now when you curse at your computer, it can curse back!

The wow factor
Is Simon Says merely the latest techno-toy or will it really increase your productivity? There's no denying the thrill you'll get the first time the computer responds to your voice. For showing off to your friends, the program has no equal.

After a few days, when the novelty wears off, you'll begin to realize how useful this program really is. Most studies cite at least a 50 percent increase in productivity when voice is used along with other types of input, and my experience confirmed this.

Talking to my computer quickly became second nature. In many ways, Simon reminded me of the first time I encountered that strange creature called a mouse. Although using it was awkward at first, the mouse soon became a natural extension of my arm.

Using Simon Says really did speed up my work. For example, I saved my document more often because all I had to do was say the word "save." Simon Says seems particularly well-suited to graphics applications like Adobe Illustrator and Create, in which the creative flow is sometimes destroyed by the need to take your hands off the mouse to choose tools or menu items. Using Simon Says, you can keep your hand in the drawing area. Simon is also a time-saver when you need to choose a command that is buried several levels deep in a hierarchical menu.

NeXT's own applications seem made for voice recognition. In Mail, for example, you can record a voice-mail message and send it on its way, all without using your hands. Simon Says even helps to minimize the negative aspects of the system: A large monitor is great, but you can quickly tire of moving the mouse vast distances across the screen. With Simon Says, you don't have to.

I had the most success with Simon in the Workspace, where Simon Says can be used to launch applications, empty the Recycler, or bring up the Processes panel. Simon Says can even serve as a Dock extender, since there's no longer any need to store your applications in the Dock to access them quickly.

The NeXT finds its voice
The NeXT computer wasn't designed for voice control, but it is uniquely well suited to it. The consistency of NeXTstep's command structure allowed the developers of Simon Says to concentrate on recognizing a core set of commands, increasing general accuracy. But the biggest gain is in the hardware. The closest competitor to Simon Says, Voice Navigator II on the Macintosh, requires a hardware interface containing a DSP and a microphone in addition to its software, boosting the cost of simple voice recognition to well over $500.

Surprisingly, Simon Says does not use the DSP chip found in every NeXT computer and is well-suited for the high-speed computations necessary for speech-recognition. Instead, Simon does all his work on the '040, which has proven to be fast enough for real-time recognition without degrading system performance. This frees the DSP for other tasks.

Simon Says is so well integrated with the rest of NeXTstep that it seems a natural extension of the operating system. For example, careful attention was paid to multitasking: The appropriate word list is automatically activated as you switch between applications.

Word lists can be shared among users, although each user will have to train Simon to recognize his or her voice. Nevertheless, this feature can save time if you've customized the list extensively.

Listen up
Any NeXT application can be trained to work with Simon Says; the program simply looks in the .nib file for those items that have keyboard equivalents. Unfortunately, there's no way to assign voice commands to the other menu items. There's also no way to control buttons in panels or describe mouse movements and clicks. Hopefully, HSD will address these shortcomings in a future version of the program.

Though extremely accurate, Simon Says isn't perfect. The first time the computer pops up a panel that you didn't ask for or refuses to respond to your insistent demands, you'll be tempted to give up the idea entirely. Nevertheless, these occasional glitches aren't enough to diminish the wonder of this program.

In the prerelease version looked at for this review, the program often had difficulty with the noise generated from my typing in an otherwise quiet environment. Oddly enough, it did a much better job of filtering out other kinds of background noise, even the loud music that permeates my workspace. HSD is aware of the problem and assures us this will be fixed by the release date.

While the built-in microphone on the front of the MegaPixel Display is adequate for occasional use in noisy environments and full-time use in more quiet ones, many users will want to consider purchasing a higher-quality mike, preferably a unidirectional headset.

Since Simon Says is intended to be used all the time, users may also run into conflicts with programs that run in the background (such as the popular screen saver BackSpace). These programs will slow to a crawl while Simon is "thinking."

Since a prerelease version of Simon Says was used for this review, we are assigning it the maximum prerelease rating of four cubes. We expect to be able to upgrade this to five if the shipping product with completed documentation, due to ship before you read this, matches the quality of the beta.

NeXT folklore has it that Steve Jobs wanted to include the ability to respond to voice commands in the original Cube, but it proved too difficult to implement in time for the machine's unveiling. As with the fabled DSP modem, it took an innovative third-party developer to finally make his vision a reality.

Lee Sherman is a writer and NeXT developer specializing in music and multimedia.

Simon Says

4 Cubes

Software that brings affordable voice recognition to the NeXT machine. The speaker-dependent, language-independent system can be trained to work with any NeXT application. You can launch, hide, and quit applications, perform formatting operations, send mail, and create voice macros. Similar-sounding commands and background noise can cause confusion. Voice commands are limited to menu items that have keyboard alternatives.


HSD Microcomputer U.S., 1350 Pear Ave., Ste. C, Mountain View, CA 94043.