Speech synthesis is the artificial production of human speech

Speech recognition algorithms take this a step further by trying to recognizepatterns in the extracted parameters. This typically involves comparing thesegment information with templates of previously stored sounds, in an attemptto identify the spoken words. The problem is, this method does not work verywell. It is useful for some applications, but is far below the capabilities ofhuman listeners. To understand why speech recognition is so difficult forcomputers, imagine someone unexpectedly speaking the following sentence:

The intelligibility of speech sysnthesis systems that are available nowadays is usually high enough to enable comparisons between different synthesis systems based on the speech quality. However, in some situations, like a civil aircraft cockpit, the acoustic environment may be such that intelligibility is a discriminating factor between systems. In this paper we propose a methodology for comparing speech synthesis systems based on the Speech Reception Threshold (SRT). With this method the signal-to-noise ratio is found at which 50 % intelligibility of redundant sentences is reached. A system with a lower SRT value is said to be more robust against masking noise. We have compared 5 commercial speech synthesis systems (4 male voices, 5 female voices) in an SRT experiment using a masking noise that was spectrally equivalent to cockpit noise. SRT values range from-4.1dB to 1.1dB. An ANOVA revealed that two of the nine systems had a significantly lower SRT than the rest. There was also an effect of the test subject, which is remarkable because the SRT has usually small variability over listeners.