Listen up Dragon…take this down
By Tom Rataj
By Tom Rataj
We’ve watched television and movie characters speak to and interact with computer systems for years. One of the most famous early examples was the original Star Trek TV series.
Quickly moving into the real world, speech-recognition and control has been available for several years on automated telephone answering systems, allowing callers to navigate via voice.
More recently, smartphone users became familiar with speech-recognition and device control applications, including the famous SIRI app on the iPhone. Android has Google Now, BlackBerry has the new Assistant and Cortana was recently launched on the Windows Phone platform. There are also several aftermarket applications available.
The speed and accuracy of speech-recognition software for desktop and laptop users fell well short of the marketing hype for years. As with smart phones, the bottleneck was hardware not powerful enough to handle the complex software and programs with either very limited vocabulary or which could not reliably recognize the spoken word.
That began to change about four years ago with the arrival of more powerful computers. Software has increased in complexity and efficiency and can now leverage all that processing power to provide very fast and accurate speech recognition.
As I sit here “typing” this article I am quite comfortable, my feet up on the desk and my arms crossed. I have on a small headset with boom microphone plugged into my desktop computer. The text is being entered into a Microsoft Word document that I opened by using simple plain language voice commands. The software is also designed to use the built-in microphone on most laptop computers, although background noise levels would present a challenge.
I used the latest version of Dragon Naturally Speaking 13 (Premium), the industry benchmark brand for Windows computers. Not only does it enter the text for me, I can also instruct it to add punctuation marks and capitalize, italicize, bold and underline words.
Dragon speech-recognition software comes in three consumer focused versions (Basic, Home and Premium), ranging from $50 to $200. There is also a $600 Enterprise focused professional version for large companies. All work with Windows 7 or newer. Dragon Dictate for Mac starts at $200.
Computer hardware requirements are not particularly onerous, so a decent midrange computer would typically be adequate. As one would expect, the complexity increases as you move up the version range.
The software guides a new user through a tutorial explaining the process and basic commands and navigation controls. It also steps a user through the training process. You read text on the screen and the software “trains” itself to understand your voice. Dragon uses this initial training session to develop a user profile, which it continuously builds on by constantly analysing your choice of words and language.
More than one profile can be created, allowing several people to take their turn at using the software.
The training process was surprisingly short and simple, taking only about five minutes. I was quickly using it successful, despite little practice and not much attention to clearly enunciating words.
Right from the start the software correctly recognized virtually every word. It was relatively easy to navigate back and correct the few words it didn’t understand by just using my voice to select the words and apply corrections.
The primary task of recognizing speech and inputting the text is easy to learn and quite reliable from the start. Learning all the commands and controls to navigate a document or computer and apply formatting can be quite challenging because there’s a lot to learn.
I eventually found a command cheat-sheet on the Internet that conveniently groups most of the major commands and controls together on several pages. I printed a copy and put it on my desk for easy reference. Users can also create additional controls and commands based on their needs.
The most interesting challenge with this type of software is learning to formulate your thoughts before speaking them, which is quite a bit different than typing as the words come to mind.
The software is surprisingly fast both in recognizing what is said and keeping up with the dictation, even when a user talks very fast. I challenged it with my best imitation of a K-Tel TV commercial announcer, speaking a mile a minute. It kept up almost flawlessly, making very few mistakes.
As a test, I just re-read the previous paragraph as quickly as possible. There were a couple of minor errors, likely because I spoke so quickly I didn’t enunciate my words clearly. Speak at a normal cadence, as you would in conversation, and the software easily keeps up and gets every word correct.
The main use of the software is for creating new text or transposing text from previously written documents. Users with relatively poor or slow typing skills will find it very beneficial for inputting text.
Some Canadian police agencies have the Dragon Enterprise version so officers can dictate reports. They review the completed report on screen and manually correct misspelled words or other errors using a keyboard.
Using the software in a noisy environment, such as a patrol car or busy investigative office, would present some challenges. A boom microphone that filters out noise would mitigate many problems.
While the software is primarily designed to work on a desktop or laptop, it also works with audio files recorded on digital recorders and smartphones. You simply transfer the file to the host computer and process it with the software. It’s designed to work with only a single voice at a time, so it would not be effective in recognizing a conversation between several people.
Dragon Naturally Speaking works with virtually any application that accepts text input, including social media, spreadsheets, e-mail programs and most major Internet browsers.
I read several online reviews, all quite favourable. Beyond complaining about the price of the more expensive versions, most reviewers noted the challenges of learning all the commands and controls. All agreed that inputting text was fast and accurate right from the start.
Almost all of this article was created and edited through voice dictation, although I did use the keyboard and mouse to navigate some of the document, primarily because I was unfamiliar with all the commands and controls and had limited time.