You are here

Cues for Text-to-Speech

The effect of grammatical 'function words' was first demonstrated in a well-known experiment by the psycholinguists Fodor and Garrett in 1967 [1].

An illustration is provided below.

Consider sentence #1 below that contains a structure that is frequently found in books and news articles. The highlighted segment in sentence #1 (i.e. 'bitten by the dog in the park') is a syntactic structure known as a Reduced-Form Restrictive Relative Clause (in Passive Voice).

# Sentence Audio Clip Original
1 The lawyer advised the man bitten by the dog in the park not to sue the city.

Now consider sentence #2 that is much easier to comprehend due to the insertion of the words 'who was' (known as grammatical 'function words'), as well as a change of 'pitch' in the embedded clause, and pauses' at the beginning and the end of the embedded clause (collectively termed 'syntactic cues'):

# Sentence Audio Clip Revised
2 The lawyer advised the man who was bitten by the dog in the park [pause] not to sue the city.

The contrast between the two audio clips1 shows that the insertion of syntactic cues (i.e. 'function words', change of 'pitch', and 'pauses') makes the sentence significantly easier to understand.2

Note that the insertion of the grammatical 'function words' shown in sentence #2 does not significantly alter the semantics of the sentence. However, there is some ambiguity with respect to the phrase 'in the park'; this phrase could be a part of the embedded clause (i.e. 'the place of the biting') or a part of the outer clause (i.e. 'the place of the advising'). Resolution of this ambiguity requires human intervention, because contextual information is not available to a Deep Parser that works on individual sentences.

Note also that one could choose to insert a 'pause' instead of 'who was' in the example above (sentence #2). Of course, this will not be as helpful to the reader as the insertion of grammatical function words that reduce ambiguity.

  • 1. The audio clips were produced using Festival, an open-source Text-To-Speech software.
  • 2. It must be noted that persons with relatively low Working-Memory Spans are likely to find sentence #1 difficult to process; sentence #2 should be marginally easier to process, although it imposes a high cognitive load as well.

References