In Section 2.1, I have assumed that stuttering in connected speech is caused by a disruption of auditory feedback in the back part of the speech unit preceding the stuttering event. Now, two questions must be answered: What kind of feedback disruptions is it, and what is the cause of the disruptions? My answers will be: The disruption occurs not in the sensory periphery, i.e., not at the level of audition, but at the level of central processing, and the cause is a misallocation of attention, i.e., an inappropriate distribution of perceptual- and processing capacity during speech.
Cherry (1953) carried out a dichotic listening experiment in which two different spoken messages were presented simultaneously, one to each ear. Participants were asked to pay attention to just one of the messages and to repeat it aloud. Later, when asked to describe the content of the ignored message, participants were not only unable to do this, they even appeared not to have noticed a change in language (from English to German), or a change to speech played backward. They only noticed gross physical changes in the stimuli, such as if the unattended speaker switched gender, or if unattended speech was replaced by a 400-Hz tone (I have taken this short summary from Sabri et al., 2008).
Cherry’s observation has become well known as the ‘cocktail party effect’ It indicates an automatic mechanism of selective attention that allows us to listen to someone and to comprehend his or her speech in a hubbub of conversations. It makes it possible to sift out an individual voice and, at the same time, to ignore all others by focusing on the selected voice. All ignored voices are perceived as ambient noise; this acoustic information seems to be processed only on a low level. That means, in turn: If attention is not directed to a speaker’s voice, speech cannot be understood, except a few words that function as signals and attract attention, e.g., one’s own name (Moray, 1959; Wood & Nelson, 1995).
Cherry’s results were confirmed by means of brain imaging. Jäncke, Mirzazade, and Shah (1999) found in a speech recognition task, that activations in the primary and secondary auditory cortex was greatest when the participants’ attention was directed to the stimuli in order to detect a specific target syllable, but weakest when participants ignored the stimuli. Likewise, Hugdahl et al. (2003) found increased activations in auditory association areas (middle and superior temporal gyrus) when participants attentively listened to verbal stimuli, compared to passive listening without focusing attention. The authors concluded that attention plays a modulatory role in neuronal activation to speech sounds, producing specific activations to specific stimulus categories, which may act to facilitate speech perception. Likewise, in an fMRI study, Sabri et al. (2008) found activations in the auditory association areas to be significantly greater when participants attended acoustic stimuli, and smaller when the stimuli were ignored. Moreover, some cortical areas probably involved in speech processing showed activation only when participants listened to the verbal stimuli, but not when the stimuli were ignored. Sabri et al. concluded that the processing of phonetic and lexical-semantic information was very limited without attention to the auditory channel.
What have these observation to do with stuttering? In Section 1.5, I described the self-monitoring of speech as a comparison between expectation and perception: The words and phrases just produced are compared with the expectad correct phoneme sequences and with the intended message. That requires to perceive and to identify the self-produced sounds and words. Supposing that one’s own speech, when heard (i.e., auditory feedback), is processed and comprehended in the same way as the speech of someone else (according to Levelt, 1995; McGuire, Silbersweig, & Frith, 1996; Price et al., 1996), then we can assume that the results reported above regarding the comprehension of the speech of others also apply to the comprehension of one’s own speech. This assumption is supported by Scheerer, Tumber, and Jones (2016) who for the first time investigated the impact of attention on the utilization of auditory feedback (even if only the response to a feedback manipulation in pitch was tested, not yet the impact of attention on phonological or lexical processing). The authors came to the conclusion that their results “suggest that attention is required for the speech motor control system to make optimal use of auditory feedback for the regulation and planning of speech motor commands.” (p. 826)
The comprehension of self-produced phonemes and words during speech – which is basic for self-monitoring – requires a sufficient degree of attention to the auditory channel. For unimpaired self-monitoring of speech, the speaker’s attention must be allocated in a manner that enough perceptive- and processing capacity remains for the comprehension of one’s own speech.
Let us, now, assume that a speaker’s attention is so much detracted from the auditory channel at the end of a speech unit that phonological processing is impossible. At this moment, the (high-level-) processing of auditory feedback is interrupted, e.g., the end of the speech unit is not transmitted to the monitor. But the monitoring system has expected the complete unit, thus it detects a mismatch between expectation and perception and interrupts speech flow, as described in Section 2.1. So it results the following causal chain of a stuttering event:
For an understanding of the mechanism, it is important to note that only speech comprehension depends on attention to the auditory channel, but not the monitor’s response to a mismatch between expectation and perception; The mismatch-related brain potential called ‘mismatch negativity’ (MMN) is detectable in EEG also if the person’s attention is distracted from the unexpected stimulus (Näätänen et al., 2007) (read more). Therefore, we can quite assume that the mismatch occurs because of insufficient attention to the auditory channel, since the detection of the mismatch and the resulting (invalid) error signal are independent of the speaker’s attention.
In Section 2.1, it was explained why disruptions of auditory feedback without impairment of self-monitoring cannot occur at the onset or in the initial part of a speech unit (otherwise, the expectation of the correct sound sequence cannot be generated). Therefore, for self-monitoring to work at all, the speaker’s attention can be detracted from the auditory channel only at the end or in the back part of speech units, that is, in the back part of syllables, words, phrases, or clauses.
But why should a speaker do so? I assume that, at these moments, attention is already directed to speech planning, namely for the same reasons as specified in Section 2.2: Initially, in the phase of transition from one- or two-word utterances to connected speech, sentence planning is an unfamiliar challenge for children and needs much attention. Later, after the child has become aware of the disorder, much attention is expended to avoid, to moderate, or to cover stuttering. Further, situations inducing careful wording (oral examination, talking to a superior or to a large audience, etc.) may contribute to a misallocation of attention and, by that, increase stuttering. Generally, emotional stress – positive and negative – often increases stuttering frequency (Choi et al., 2016). Strong emotions concerning the content of speech may draw all the speaker’s attention to the content and reduce the capacity for feedback processing (read more).
Is there any evidence that stutterers direct too little attention to the auditory channel during speech? In several brain imaging studies (Braun et al., 1997; Fox et al., 1996; Ingham et al., 2003), auditory association areas of the cortex that might be responsible for self-monitoring (mainly the left superior temporal gyrus, BA22; see Indefrey 2011; Indefrey & Levelt, 2004) were found to be significantly lower activated during speech in stutterers, compared to normal fluent speakers (see also Table 1). Some of those researchers concluded that stutterers may have deficits in the self-monitoring of speech (read more). Further, in conditions like metronome-paced speaking or chorus reading, enhanced speech fluency was found to be associated with greater activations in auditory association areas (Braun et al., 1997; Fox et al., 1996; Stager, Jeffries, & Braun, 2003; Toyomura, Fujii, & Kuriki, 2011). Some of those researchers assumed that the reduction of stuttering in the conditions tested was associated with an improved self-monitoring (read more).
Moreover, a meta-analysis conducted by Budde, Barron, and Fox (2014), in which all relevant brain-imaging studies published before were included, revealed that the lack of activations in auditory cortex areas was one of the very characteristics in the brain activation patterns of stutterers. They further revealed that only auditory areas were consistently greater activated with induced speech fluency (due to chorus reading, metronome-pacing, etc.) compared to stuttered speech. Admittedly, all those findings suggest only poor auditory self-perception during stuttered speech, not necessarily poor attention to the auditory channel – but remember the studies reported on above, in which a relationship was found between (1) attention to the auditory channel, (2) activation in auditory cortical areas. and (3) speech comprehension .
In this context, the experiment conducted by Vasic and Wijnen (2001; 2005) is interesting: They proposed a variation of the Covert Repair Hypothesis (Postma & Kolk, 1993); however, in my view, it is more a derivative of Maraist and Hutton (1957) – see Section 2.1.1). They assumed an overly sensitive monitoring of normal disfluencies in stutterers, and suspected the attempt to avoid these normal disfluencies to be the primary cause of stuttering – a hypothesis contrary to the above mentioned brain imaging results not suggesting that stutterers too much monitor their own speech. However that be, two conditions were examined in order to prove the following hypothesis: Stuttering is markedly reduced (1) by distraction of attention from monitoring one’s own speech in general, namely by playing a computer game while speaking (in two difficulty levels), (2) by distraction from disfluencies in one’s own speech by focusing attention on the lexical aspect of one’s own speech – participants had to monitor the occurrence of a specific function word.
The result was that distraction of attention from monitoring one’s own speech (condition 1) was, indeed, associated with a reduction of the number of stuttering blocks, but the fluency-enhancing effect was much greater when, in condition 2, participants were compelled to attentively monitor the lexical aspect of their speech. The authors took these results as a confirmation of their starting hypothesis. However, I think that the reduction of blocks in condition 1 is, indeed, attributable to distraction, namely to fewer anticipations of stuttering, corresponding to the Anticipatory Struggle Hypothesis (see Section 2.5). The much greater reduction of blocks in condition 2, by contrast, might be the result of an improved processing of auditory feedback, because attention was directed to the auditory channel.
Finally, I want to remark: The thesis that attention is detracted from auditory feedback only in the back parts of speech units may appear somewhat artificial – at least, if it is considered the only cause of stuttering in connected speech. But first, remember that (1) only the initial part of a speech unit must be excluded from a feedback disruption, (2) the invalid error signal is always elicited at the end of the unit, even if the mismatch begins in the middle because the monitor waits for the missing part to come (the mismatch we assume is not due to a wrong part, but only due to a missing part in the feedback – that’s an important difference to real speech errors), and (3) it takes some time for the error signal to take effect (reaction time of the monitoring system).
A mismatch negativity (MMN) is a component of the event-related brain potential (ERP). It occurs ca. 100–150ms after stimulus onset as a response to an unexpected stimulus, i.e., a stimulus deviant from an expectation previously formed. The MMN indicates a basic stage of sensory processing, namely automatic scanning and change detection, and serves for an involuntary orienting of attention to unexpected changes. A mismatch negativity occurs also if an expected stimulus, for instance, a beat in a rhythm, is suddenly lacking.
The function of the brain process eliciting the MMN is to direct attention to things not matching a certain expectation. Therefore, this process is also fit for directing attention to errors in an automatic sensorimotor sequence: An error is an unexpected event, compared to the expected correct execution of the sequence. The stage of sensory processing which elicits the MMN is independent of attention: An MMN occurs even if a person is strongly distracted from the deviant stimulus (Pulvermüller et al., 2008). That is important for error detection because the probability of errors is higher when attention is distracted.
Now, it may be more clear why, in the self-monitoring of speech, the automatic monitor is unable to distinguish between a real speech error and a seeming error due to a feedback disruption: The automatic monitor does not detect errors at all, but only deviations from the expected – it responses to “trouble”, as Levelt (1995) wrote.
Van Riper (1979) reported the following case: One of his clients was an Italian boy who began to stutter severely after his mother had died and his father had married another woman. After a while, Van Riper gleaned that this woman permanently bullied and abused the boy, and he persuaded the father of divorcing. After the wicked stepmother was away, the boy’s stuttering disappeared after a short time of treatment,
I think, the boy was worried in his communication with the stepmother, but also with his father, who had, at least, married this woman. Probably, the boy considered and planned his utterances carefully in order to avoid saying something wrong, provoking the stepmother, or hurting the father. And perhaps, he anxiously observed their emotional responses while speaking. Such behavior impacts the allocation of attention during speech and may, in this way, contribute to the causation of stuttering.
It is important that a theory of stuttering can define the ‘interface’ between the physiological mechanism of stuttering, on the one hand, and psychical and environmental factors that obviously exacerbate stuttering, on the other hand – it is not enough only to claim that stress entails higher demands for speech control (some stutterers are fluent when speaking in front of a large audience, but stutter when talking with a friend). I think, the interface is the allocation of attention during speech. However I do not assume that psychical and environmental factors alone cause stuttering (as a longer lasting disorder) – a physical predisposition for stuttering seems to be necessary. The boy reported on by Van Riper had a somewhat younger brother who was also bullied and beaten by the stepmother, and who did not develop stuttering.
Fox et al. (1996) wrote:
“Left superior temporal activations, observed in the controls and attributed to self-monitoring, were virtually absent during stuttering. Deactivations during stuttering were also distinctive. Not only did left superior temporal cortex fail to activate (above), but left posterior temporal cortex (BA22) showed significant deactivations not seen in the controls.” (161) “The neural systems of stuttering have been isolated and include […] lack of normal 'self-monitoring' activations of left, anterior, superior temporal phonological circuits …” (161)
Braun et al. (1997) wrote:
“...our results suggest that when they are dysfluent, stuttering subjects may not be monitoring speech-language output effectively in the same fashion as controls. Perhaps an inability to monitor rapid, spontaneous speech output may be related, at some level, to the production of stuttered speech.” (774) “...the data suggest that, during the production of stuttered speech, there appears to be a functional dissociation between activity in post-rolandic regions, which play a role in perception and decoding of sensory (particularly auditory) information, and anterior forebrain regions, which play a role in the regulation of motor function. Anterior regions were disproportionately active in stuttering subjects while post-rolandic regions were relatively silent. The posterior regions may somehow fail to provide the integrated sensory feedback upon which the anterior regions depend for efficient coordination of speech output.” (780)
Ingham et al. (2003) wrote:
“Thus it may be, [...] that persistent stutterers show poor responsiveness to their own speech signal and probably have an impoverished capacity to monitor their own speech.” (312) (return)
Fox et al. (1996) wrote:
“Induced fluency markedly reduced the abnormalities seen in stuttering […] Deactivations of left inferior frontal and left posterior temporal cortex were eliminated, and lack of activation in left superior temporal cortex was substantially reduced.” (161)
Stager, Jeffries, and Braun (2003) wrote:
“...a much wider array of areas that appear to participate in self-monitoring of speech and voice were more active during fluency-evoking than during dysfluency-evoking conditions, in both PWS and control subjects. These regions include, in the right and left hemispheres, both anterior and posterior auditory association areas as well as core and belt areas surrounding the primary auditory cortex. These regions encompass those that are activated by voice and intelligible speech and those that are activated when subjects monitor their speech output under conditions in which auditory verbal feedback is altered.” (332) “… the direct comparison of responses in PWS and controls pinpointed a number of regions in which fluency-evoking conditions evoked a more robust response in stuttering subjects. These included the anterior MTG and anterior STG – regions that appear to be selectively activated by voice and intelligible speech – suggesting that the fluency-evoking conditions may enhance self-monitoring to a greater degree in PWS than in controls.” (333) (return)