|
|
|
|
National Library News
October 1999
Vol. 31, no. 10
|
Taming the Dragon: Using a Voice Recognition Program
by Alan Gillmor,
Professor, Music Department, Carleton University
A few years ago I acted as the external appraiser for the materials that now form the Istvan Anhalt Fonds in the Music Division of the National Library. While sifting through this mass of material, I came across the extended correspondence between Prof. Anhalt and the American composer George Rochberg: hundreds of handwritten letters written over a nearly 40-year period, and still being written.
Anyone who is acquainted with the published essays and books of both men, not to speak of their music - symphonies, operas, chamber music - will realize that we are dealing with two extraordinary minds whose intellectual curiosity roams freely over an immense territory, both ancient and modern: music and literature, religion and philosophy, politics and society. Their letters provide a superb chronicle of our complex age and a primary source of inestimable value to future interpreters of the achievement of two outstanding composers. With the permission of both men, I decided that this rich correspondence, worthy of publication, should gain a new life beyond the silent shelves of the National Library.
Computer technology, more than anything else, has defined the lifestyles of the late 20th century and, as we all know firsthand, it is one of the great love-hate relationships of all time, a seemingly bottomless topic that has infiltrated our every waking moment as we exchange endless stories of the joys and frustrations of cruising the information highway. Let me say straight off that without the Dragon Naturally Speaking voice recognition program kindly made available to me by the National Library, I would never have considered taking on the project of transcribing this mountain of material into electronic form. Over a period of several months this past winter I sat in a small room at the National Library talking to my Dragon or, as it must have seemed to the occasional passerby, to myself.
After a brief "training exercise" whereby the program "learned" the idiosyncrasies of my speech, we were ready to roll. Considering the relative newness of this technology, it is a truly remarkable product which will undoubtedly greatly improve as its children and grandchildren reach the marketplace. Although the program will translate voice to editable text at the normal speaking speed of up to 150 words per minute, or at least twice the speed of a competent typist, the "Naturally Speaking" in the product name may be a slight exaggeration, for I found that the best results were obtained by over-articulating to a certain extent. This can prove rather tiring over a period of several hours. But the good news is that the program constantly improves its performance by updating the speech files based on a steady exposure to the speaker's voice patterns. Moreover, it can be taught to correct its often-silly mistakes, so that by the end of the project, we were, after some frustrating weeks, once again on speaking terms with one another (at one point I gave it a piece of my mind using a few choice unprintable epithets, only to realize that the microphone was still on and that the program has no moral judgment whatsoever). Among its more elegant features: automatic capitalization of the first word after a period; voice commands that control all punctuation, paragraphing, spacing, bold, italics, underlining, corrections, editing, etc.; with fair accuracy it will choose correctly among, for example, "to", "too", and "two", depending on how it "reads" the context; it can be "taught" such things as proper names, British versus American spellings, or any technical term beyond its basic vocabulary.
Perhaps the next generations of Dragon Naturally Speaking will be available in discipline-specific editions already bundled with arcane technical vocabulary. As it is now, perhaps the biggest surprise was the relative paucity of the program dictionary beyond a rather basic vocabulary, as some of the following howlers will illustrate. Like everything else in the great mass consumer society, the program is designed, not for the scholar, but for the mythical "average" user. One can perhaps forgive the designers for not including relatively rarefied terms such as "Pythagorean", which comes out "Tiger Reagan"; but the great Irish poet Yeats deserves better than "Gates", and surely Socrates would have been mystified by "soccer keys". The word "rejoice" would transmogrify (I can't imagine what it would do with that word) into "read Joyce" - so it does know of a few Irish writers. But how does one explain certain dyslexic tendencies (I am not making this up) such as "gods" coming out "dogs"? Finally, and predictably, the world of Microsoft is never far away; just two examples: "rampaging"/ "RAM paging", "pantomime"/ "Pentium time". But never mind: after all, to err is non-human.
I would like to extend my deeply felt thanks to Dr. Timothy Maloney and the staff of the National Library for making this wonderful technology available to me in what I believe to be a pioneering project. Perhaps the days of handwritten correspondence are over (I can see it now: "The Collected E-mail of …"), but as long as scholars will have need to investigate the significant part of our heritage that remains hidden away in yellowed copies of old letters and other similar documents from the Age of Penmanship (Penpersonship?), technology such as Dragon Naturally Speaking will save more than one investigator from incipient insanity or, at the very least, heavy drinking.
The Istvan Anhalt Fonds in the Music Division of the National Library of Canada contain records pertaining to the life and musical activities of the professor, conductor, pianist, and pioneer composer of electroacoustic music, Istvan Anhalt. They include, among other things, academic records, autograph manuscripts of musical works, press clippings, photographs, and the "hundreds of handwritten letters" referred to in Alan Gillmor's "Taming the Dragon". |
|