You are here

Syntactic Complexity

Take a look at sentence #1 below, for which most Readability indicators (such as the Flesch-Kincaid Grade Level [1],[2],[3])1 indicate a 'High' Readability:

# Sentence Flesch-Kincaid Grade Level
1 The horse raced past the barn fell. 0.0

The reasons for this 'High' Readability reported by Readability indicators include (see [4]):

  • all the words are short (one syllable each),
  • all the words are familiar even to children in the fourth-grade [5][6][7],
  • the sentence is short (seven words)
  • the underlying syntactic complexity is not considered

However, sentence #1 is the well-known garden-path sentence that has been studied by cognitive psychologists for decades. It is known to place a very heavy load on Working Memory. In this garden-path sentence, the reader realizes that he/ she was 'led up the garden path' only when the verb 'fell' is reached; at this point, the reader realizes that the sentence has to be reprocessed from scratch. Even university students who participated in the study were confused when faced with sentences such as #1.

Now take a look at sentence #2, where the addition of the grammatical 'function words' has made the sentence much easier to comprehend (although an individual with a Low Working-Memory span may face discomfort in processing sentence #2). If the sentence is read aloud, notice also how much easier it is to comprehend when a pause is inserted at the site indicated (after 'barn'). Some psycholinguists believe that pauses at the right points (and changes in intonation around such points) can help the listener figure out the structure of a clause and assimilate its essence, before moving on to complete the sentence. An intonation cue is an additional aid that may be used by Text-To-Speech software for changes in intonation around an Object Gap [8],[9],[10],[11].

# Sentence Flesch-Kincaid Grade Level
2 The horse that was raced [intonation cue] past the barn [pause] fell. 0.0

While few authors use sentences such as sentence #1, take a look at sentence #3 below which is more common, especially when people write the way they speak (for e.g., in novels). Here again, the addition of a few grammatical 'function words' (and the 'pause') enhances readability hugely.

# Sentence Flesch-Kincaid Grade Level
3 The defendant claimed that the boy the dog bit tormented him for months. 4.8
4 The defendant claimed that the boy whom the dog bit [pause] tormented him2 for months. 5.0

Clearly, most authors would like their ideas to reach the widest audience, including individuals with low Working-Memory spans. However, the tools used by most authors (for e.g., Word Processors) do not report anything out of the ordinary in sentences #1 and #3 3. Given the absence of the right diagnostic tools, many authors rely on Readability indicators (such as the Flesch-Kincaid Grade Level indicator shown above) to judge the readability of passages. In fact, many governments have mandated that content intended for a mass audience should be at a Flesch-Kincaid Grade Level of around 8 (but see [4] to understand how the Grade Level of a passage is assessed by expert teachers). However, most Readability indicators work on the passage as whole, and not on individual sentences; as a result, structures such as #1 or #3 above may be overlooked during the editing process and end up being published.

There is now a vast library of content in electronic form that is being 'read aloud' everyday by commuters (among others) using Text-To-Speech software (TTS), or translated to other languages using Machine Translation software (MT). Both these software categories can benefit from the insertion of grammatical 'function words' and structural cues. This can be verified easily for sentences #2 and #4 above. The insertion of grammatical 'function words' and other structural cues improves the output of Text-To-Speech software significantly (try substituting a comma in place of 'pause'). Machine Translation software can also improve the quality of translation when complex structures are simplified before they are translated. However, only a radical rewrite can help when sentences are extremely complex - technology can only help marginally (and not at all, if a sophisticated parser cannot parse such sentences reliably).

There is a need for a technological solution that automatically identifies such complex grammatical structures in various languages, and indicates where grammatical 'function words', 'pauses', and 'intonation cues' may be inserted to reduce syntactic complexity.

  • 1. Readability indicators such as the Flesch-Kincaid Grade Level indicator were designed to work on passages rather than on individual sentences - hence the Flesch-Kincaid Grade Level scores shown may be considered to be approximate scores.
  • 2. Readers will be confused if a pronoun cannot be linked with its referent easily (it is not clear, from sentences #3 and #4, whom the boy tormented). Most authors would probably rewrite sentence #4 if it were made clear that there were multiple contenders for the referent of the pronoun 'him'. Government & Binding Theory helps us to rule out 'the boy' as the referent for the pronoun 'him', but leaves us with two candidates in the sentence and possibly others in the discourse. Common sense tells us that it is quite likely that, if the defendant was a man, he was defending his dog (and himself, in court) by implying 'the boy tormented the dog for months, and eventually the dog could not bear it any longer and bit the boy'. Identification of the referent of the pronoun 'him' in the above sentence is a matter where syntax alone cannot help much (other than to rule out 'the boy'); it requires a deeper understanding of the discourse and some 'knowledge of the world'.
  • 3. One of the reasons for this situation may be a product design choice to avoid a glaring difference between coverage of different languages i.e. a few languages covered very deeply, while tens of other languages are covered much less deeply.

References