Hi, How're u
In computer technology, Speech Recognition refers to the recognition of human speech by computers for the performance of speaker-initiated computer-generated functions (e.g., transcribing speech to text; data entry; operating electronic and mechanical devices; automated processing of telephone calls) — a main element of so-called natural language processing through computer speech technology. Speech derives from sounds created by the humanarticulatory system, including the lungs, vocal cords, and tongue. Through exposure to variations in speech patterns during infancy, a child learns to recognize the same words or phrases despite different modes of pronunciation by different people— e.g., pronunciation differing in pitch, tone, emphasis, intonation pattern. The cognitive ability of the brain enables humans to achieve that remarkable capability. As of this writing (2008), we can reproduce that capability in computers only to a limited degree, but in many ways still useful. The Challenge of Speech Recognition
Writing systems are ancient, going back as far as the Sumerians of 6,000 years ago. The phonograph, which allowed the analog recording and playback of speech, dates to 1877. Speech recognition had to await the development of computer, however, due to multifarious problems with the recognition of speech.
History of Speech Recognition
Despite the manifold difficulties, speech recognition has been attempted for almost as long as there have been digital computers. As early as 1952, researchers at Bell Labs had developed an Automatic Digit Recognizer, or "Audrey". Audrey attained an accuracy of 97 to 99 percent if the speaker was male, and if the speaker paused 350 milliseconds between words, and if the speaker limited his vocabulary to the digits from one to nine, plus "oh", and if the machine could be adjusted to the speaker's speech profile. Results dipped as low as 60 percent if the recognizer was not adjusted.
Speech Recognition Today
Technology
Current voice recognition technologies work on the ability to mathematically analyze the sound waves formed by our voices through resonance and spectrum analysis. Computer systems first record the sound waves spoken into a microphone through a digital to analog converter. The analog or continuous sound wave that we produce when we say a word is sliced up into small time fragments. These fragments are then measured based on their amplitude levels, the level of compression of air released from a person’s mouth. To measure the amplitudes and convert a sound wave to digital format the industry has commonly used the Nyquist-Shannon Theorem.