1.4. Expectations – the basis of self-monitoring

The self-monitoring of speech serves for the detection of errors (slips of the tongue) and for checking whether an utterance matches the intended message. As in the preceding sections already mentioned, automatic self-monitoring also plays an important role in sequencing, by ensuring that the next speaking program can be executed only if the current one has correctly been completed. A further part of the self-monitoring of speech serves for the online adjustment of rate, volume, pitch, and articulatory distinctness.

Error detection is based on comparing sensory expectations (predictions of how it should be) with sensory feedback (how it actually is). In speaking, the crucial type of sensory feedback is auditory feedback. Speech errors are detected by comparing a word or phrase just spoken with an expectation of its correct form in terms of sound sequence, grammar, and syntactic embedding. This raises the question of how those expectations are generated. Levelt (1995) believed a ‘phonetic plan’ generated in the Formulator (see figure) to be the basis of expectations. A variety of this approach is the ‘efference copy’ theory. An efference copy is the (hypothesized) projection of a motor plan to that sensory system in which the planned movement and its success or failure are perceived.

For detecting an error, the speaker needs a correct expectation in terms of phonology, syntax, and grammar. Is a copy of a motor plan useful as the basis for a correct expectation? Obviously not, since the error could not have occurred if the plan had not already been incorrect. And is s copy of a motor plan necessary for the detection of speech errors? Obviously not: when listening to the speech of others, we detect errors immediately without having any copy of the speaker’s plan.

Syntax errors (phrase structure violations) in spoken sentences evoke responses in a listener’s brain, measurable as event-related potentials (ERPs) after about 120 ms; semantic errors elicit ERPs after about 400 ms ( Friederici, 1999). Obviously, a listener can generate expectations of what a speaker is going to say and what it should sound like and compare these expectations with their perception in a very short time.

How can a listener generate these expectations? First, he or she intuitively knows from experience in what way the initial words of a sentence constrain a speaker’s options to continue. The more words of a sentence have already been spoken, the fewer syntactic and semantic options remain. That makes it easier for the listener to quickly develop expectations and to identify a perception not matching the expectation as a potential mistake. Simply put, it is the words already heard and the listener’s implicit knowledge of language that enables a listener’s brain to predict how the speaker must continue.

However, not only the words already heard allow us to predict what follows. A listener often recognizes a familiar word already after hearing its initial portion, particularly if the word is embedded in a sentence context. The context facilitates the recognition of a word based on a few initial sounds and the prediction of its remaining sound sequence. Halle and Stevens (1959) described a model of how words can be recognized based on minimal information input.

This ‘analysis-by-synthesis’ model was further developed by Poeppel and Monaban (2011). A first vague prediction based on initial sounds is updated step by step by means of additional sounds and the context. By the way, the ability to recognize a familiar word after having heard only a few initial sounds is merely a special case of a general ability that enables us, for example, to identify a familiar musical composition after having heard the opening bars.

The above assumptions are supported by Astheimer and Sanders (2009, 2012). By means of auditory event-related potentials, they found that both adults and preschool-aged children, when listening to connected speech, temporally modulated selective attention to preferentially process the initial portions of words.

In summary, we can say that auditory information and her implicit knowledge of language together enable a person to detect errors in the speech of someone else. Assuming now, according to Levelt (1995), that the same mechanism that makes us detect errors in the speech of others also makes us detect our own speech errors. Then, the two components necessary for the self-monitoring of speech – correct expectations and the perception of what was said – depend on auditory feedback (read more).

However, it may be somewhat easier to produce correct expectations in monitoring our own speech, than in listening to the speech of someone else, since we know the intended message of our own speech. That may be the reason why we can detect errors in our own speech more quickly and surely than in the speech of others, particularly semantic errors. Nonetheless, the expectation of the correct sound sequence of a familiar word or phrase is produced mainly based on the auditory feedback of the initial portion of that word or phrase and the speaker’s knowledge of language.

 

to the top

next page


Footnotes

Not a plan, but auditory feedback and knowledge of language are the basis of correct predictions

Maybe it appears strange when I claim that auditory feedback, not an inner plan, is the basis of those correct predictions that allow us to detect our speech errors. But Lind et al. (2014) have shown that we know from auditory feedback, not from an inner plan, what words we have said. If so, the correct predictions of how these words should sound as well are not known to us before we speak the words out. That is, prediction and monitoring work almost like a zipper: the prediction of the correct sound sequence is only a fraction of a second ahead of the auditory feedback of a spoken word. Auditory feedback controls the selection of predictions; whether they are correct in themselves depends on our knowledge of language. Lack of language knowledge will limit our ability to notice speech errors. (return)
 

to the top

next page