Before I address the question of what causes the disruptions of auditory feedback, another issue must be dealt with: The preliminary definition of stuttering proposed in Section 2.1 has an unacceptable limitation: If stuttering is defined as being caused by a disruption of auditory feedback, stuttering events at the onset, on the initial syllable of an utterance are not included. Such symptoms, in my observation, seem to be relatively rare in adults, but they are common in preschoolers who stutter (Buhr & Zebrowski, 2009; Richels et al., 2010). Stuttering at the onset of an utterance can be explained by taking the impact of breathing into account.
Figure 6 shows how a sequence of breathing overlaps with a sequence of speaking: The onset of speech, i.e., the start of the first speaking program of a sequence, is also the start of the second part of the sequence inhalation–exhalation. However, speaking programs are independent of breathing: You can articulate a word, i.e., make the jaw-, mouth-. and tongue movements silently, without exhalation (‘mouthing’). And of course, breathing is independent of speaking: Normal breathing (without speaking) runs evenly; breathing during speech, by contrast, is characterized by short inhalation and long exhalation (Conrad & Schönle, 1979). The sequences of breathing and speaking must be synchronized by the brain: Inhalation phases are included into the speech sequence, by which they turn into speech units.
As explained in Section 2.1, a speech unit, e.g., a word must have been correctly and completely produced, otherwise the program of the next unit cannot successfully start (read more). Similarly,
breathin in must have finished, before a speaking program can start, because phonation should take place only during exhalation. A speaker, in a rule, does not consciously attend to this, therefore, we can assume that the brain automatically ensures that speaking can start only after the end of breathing in – possibly by means of a simple ‘watchdog’ network as proposed by Kochendörfer, Part 5, p. 98, Fig. 5.4.1–4 (in German).
Now, the question arises: Why should children start speaking without the monitor having detected the end of breathing in? If they really started speaking too early, i.e., before breathing in was finished, it would be easily observable. In a rule, it is not the case. But possibly, the internal monitoring system has simply not detected the end of a breathing in phase and, hence, it generates an invalid error signal. The cause could be that the speaker’s attention was demanded too much by other things. In my own experience as a stutterer, blocks at speech onset after breathing in can be avoided by consciously speaking only during exhalation, which naturally requires attention to the end of breathing in.
Interestingly, many therapy programs for stuttering apply breathing exercises (costal breathing or diaphragmatic breathing), e.g., the successful Regulated Breathing Therapy (Azrin & Nunn, 1974;Conelea, Rice, & Woods, 2006) . It may lesser be the breathing technique itself that has an effect on stuttering (many people breathe wrong and do not stutter), but more the necessity to pay attention to breathing during speech. Interestingly, in Regulated Breathing, clients are tought to start speaking following a small exhale (Conelea, Rice, & Woods, 2006).
Likewise, in the Passive Ariflow Technique, the stutterer learns to release a slight, passive current of air from his mouth immediately before speaking (see this this video by Martin Schwartz). After the present theory, this ensures that the switch from inspiration to expiration is sufficiently perceived such that invalid error signals are avoided. (By the way, Schwartz’s theory after which tension of the vocal cords causes stuttering cannot be true since repetitions and prolongations occur with normal phonation.)
Above, I wrote that the self-monitoring of breathing is an unconscious, automatic process – so why should it require the speaker’s attention? I use the term ‘attention’ here more in the sense of ‘perceptual- and processing capacity’ than as the designation of a volitional behavior. However, when I direct my attention to something, or when I turn away my attention from something, I alter the allocation of my perceptual- and processing capacity, which, in total, is limited. Therefore, when I direct all my attention to only one stimulus, then the consequence will be that I do not (or poorly) perceive other stimuli at the same time (read more).
Allocation of attention (of perceptual- and processing capacity) is, on the one hand, influenced by the will; on the other hand, strong stimuli attract attention, even against one’s own will. Many activities require a specific allocation of attention (you have to attend to several things at the same time), and if this allocation of attention is disturbed, e.g., by too much attention to only one aspect of the activity, mistakes can result. If an activity is an automatic routine like speaking, the appropriate allocation of attention has been automatized too and is widely unconscious, including necessary changes of attention allocation in the course of the activity. And the self-perception of breathing during speech, even if it is unconscious, probably needs a certain part of the speaker’s overall attention (read more).
Given that, at speech onset, the speaker’s attention was demanded too much by other things, thus the monitoring system has not detected the end of breathing in: Now, the monitoring system misevaluates the start of phonation as an error and blocks the speech flow. The mechanism is very similar to the monitor’s response to a disrupted auditory feedback, described in Section 2.1, but it is simpler because not correctness but only completion must be checked. The monitoring of speech units requires to generate an expectation of the correct phoneme sequence for each unit. By contrast, the monitoring of breathing during speech requires only the expectation of the every same: that breathing in is over before phonation starts.
In Figure 7, both cases are depicted with the example “My name’s Peter”: Above, the end of a word is missing in the auditory feedback, below, the end of breathing in is missing in the kinesthetic feedback. The error signal due to a mismatch between expectation and perception is elicited not before the next unit starts, because just this is the error to prevent: starting the next unit before the preceding one is complete. In both the cases depicted, the speaker has not made an error; the error signals are only the result of feedback disruptions, hence the error signals are invalid. But they have the same consequence as a valid error signal: After a reaction time, the program of the next speech unit is blocked.
Figure 7: Invalid error signal due to a disrupted auditory feedback of speech (A) and due to a disrupted proprioceptive / kinesthetic feedback of breathing (B). Green: produced speech, Red: monitoring (comparison between expectation and perception. S = invalid error signal, R = reaction time of the monitoring system.
However, the pathomechanism described above and depicted in Figure 7B accounts for stuttering only in cases with breathing in prior to speech onset. But stuttering immediately at the onset of an utterance can also occur without breathing in, as I know from my own experience. How to explain such symptoms? Prior to a longer utterance (not to a short exclamation, e.g., a curse), the lack of a shift from basal respiration to speech respiration (see, e.g., Conrad & Schönle, 1979) or the start of a longer utterance with residual air may be interpreted as an error by the control system. More generally said: The invalid error signal is elicited because the system has no sufficient information of the actual state of respiration at the onset of a sentence or of a longer utterance. This information is needed for sentence planning: The system must plan the length of the speech sequence until the next breathing pause must be dovetailed at an appropriate position – not within a phrase, preferably between clauses.
There is much empirical evidence suggesting deficits in the processing of auditory feedback during stuttered speech (Braun et al., 1997; Cai et al., 2012, 2014a; Fox et al., 1996; Ingham et al., 2003; Loucks, Chon, & Han, 2012; Stager, Jeffries, & Braun, 2003; Toyomura, Fujii, & Kuriki, 2011), but as far as I know, there is no evidence of deficits in the feedback of breathing. However, Cai et al. (2014b), investigating the connectivity between brain areas involved in speech control, found a significantly lower connectivity between the left ventral somatosensory cortex, on the one side, and motor and pre-motor areas, on the other side, which possibly indicates a deficit in the self-perception of breathing (read more).
Further, Chang, Zhu, Choo, and Angstadt (2015) found a negative correlation between fractional anisotropy and stuttering severity to be equally evident in stuttering boys and girls in the left supramarginal gyrus, suggesting that a white matter deficit in this area is related to the onset of childhood stuttering, since its probability is approximately equal in both sexes. Supramarginal gyrus plays a role in the integration of the somatosensory and tactile feedback of articulation, breathing, and phonation (Simonyan & Horwitz, 2011). A white matter deficit in this brain region possibly indicates a poor integration of the sensory feedback of breathing in speech control (more about the interpretation of white matter deficits in Chapter 4).
If my above assumptions are true, then the question arises: What may disturb the allocation of attention at speech onset? What could demand a speaker’s attention so much that the monitoring system in the brain does not detect the end of breathing in? One obvious possibility is that, at speech onset, attention is very much directed to the message intended to express. A second possibility is that, at the age of three or four years, sentence planning – putting words together in the right order and observing the rules of grammar – might require much attention because children are still unsure and unpracticed in sentence planning at this age, A connection between too much attention (processing capacity) for sentence planning and too little attention (capacity) for the perception of breathing can explain two things: (i) why stuttering mostly onsets at the age at which children start forming sentences (Bernstein Ratner & Wijnen, 2005; Reilly et al., 2009), and (ii) why, at this age, stuttering events very often occur immediately at sentence / utterance onset, i.e., when the demands of planning are highest (Buhr & Zebrowski, 2009; Richels et al., 2010).
An observation suggesting a relationship between the demands of sentence planning and the occurrence of stuttering is that, in preschool children who stutter, the probability of stuttering grows with sentence length and syntactic complexity (Bernstein Ratner & Sih, 1987; Richels et al., 2010; Yaruss, 1999). But why should sentence planning be more difficult to children who stutter than to their fluent peers? Language abilities, of young stuttering children near the onset of the disorder, as a group, seem to be normal, and even above average for some children (Reilly et al., 2009; Yairi, 2012). The problem seems not to be the knowledge of language, but rather the ability to change behavior from holistic speech planning (which is sufficient for single words and short phrases) to incremental speech planning which is appropriate for sentence formation. Too much effort and attention for planning at sentence onset may result from adherence to holistic speech planning (read more).
Additionally to a child’s natural attempt to form correct sentences, social environment may adversely affect the allocation of attention in some cases: Attention may excessively be directed to speech planning if a child cannot speak spontaneously without fear of saying the wrong thing, or speaking incorrectly, and getting punished or embarrassed. Such a child may direct too much attention to careful wording, which may account for some cases of apparently psychological causation of stuttering. After a stuttering child became aware of the disorder, fear of stuttering and the attempt to avoid it may lead to a further shift of attention to speech planning in order to substitute words with feared initial sounds, or to avoid them being the first word of the utterance. Especially this may be the cause why stuttering occurs also at the onset of short and simple utterances, or when only a single word, e.g., one’s own name is spoken – even in adult stutterers. However, the main cause why stuttering, including transient developmental stuttering, arises in early childhood might be a misallocation of attentional ressources due to the unfamiliar demands of sentence planning.
In summary, I think that stuttering at the onset of an utterance, similarly to stuttering within connected speech, can be attributed to a disruption of sensory feedback; in this case, however, it is a disruption of the proprioceptive or kinesthetic feedback of breathing. Naturally, such disruptions can also occur within connected speech after breathing pauses. Now, our preliminary theoretical definition of a stuttering event can be formulated more universally:
A stuttering event is an interruption of speech flow caused by the blockage of a speaking program. The speaking program is blocked because the monitoring system has not detected the completion of the preceding unit of the speech sequence – this unit can be a word, a syllable, a phrase, but also a breathing-in phase. The completion of the speech unit was not detected because of a disruption of sensory feedback – either auditory feedback or the proprioceptive / kinesthetic feedback of breathing, respectively.
It should have become clear from the above that I do not assume people stutter because they breathe in a wrong manner, and they would have to improve their breath control, e.g., by learning a special technique. There are many people who breathe in a less-then-ideal manner, but do not stutter. If breathing techniques are still able to reduce or even to eliminate stuttering, then, in my estimation, because (and as long as) practicing those techniques requires to pay some attention to breathing during speech.
Levelt’s Main Interruption Rule, which is a basis of the present theory (see Section 2.1) describes the behavior of an automatic monitoring system in widely automatically controlled speech. In normal communication, we express our thoughts and wish to be understood by our listeners, hence we automatically seek to articulate words correctly and to respect grammar and syntax. Of course, we can deliberately make things wrong. We can, e.g., deliberately speak words incompletely. But in doing so, we shift from the automatic to a volitional control of speech – and we will hardly produce fluent speech in this way. See also the next footnote below.
Another case is individuals who sometimes unwittingly articulate words incompletely: They clip the last (unstressed) syllable of longer words or, more precisely, the phonemes are slurred so that the syllable becomes extremely short and unintelligible – a behavior typical for cluttering. They do so without noticing or repairing these errors, and nevertheless, they don’t stutter. I observed two such persons, and interestingly, both were parsons, i.e., professional speakers, and they certainly know how to speak the words correctly. My only explanation for this phenomenon is that their automatic self-monitoring is insensitive to such errors in the auditory feedback at the end of speech units – which, after my theory, means that they will hardly ever stutter.
You may object that we are quite able to speak during breathing in if we want to do so. That’s right, however, we are also able to articulate words wrong if we want to do so, or to deliberately form syntactically wrong sentences – without an internal automatic monitor to interrupt speech flow. Obviously, we can deliberately override Levelt’s Main Interruption Rule. Volitional behavior overrides automatic control. However, the ability of humans to speak fluently and widely correct in phonology, grammar, and syntax – and all this spontaneously, without thinking of rules, even without explicit knowledge of rules – this amazing ability is depending on automatic control and automatic self-monitoring. Indeed, the onset of an utterance is often (not always) a deliberate and conscious act, but its success is depending on automatic and unconscious processes in the brain. Detecting the end of breathing in before speaking might be one of them.
As everybody knows from his/her own experience, conscious perception is influenced by attention: A slight pain, e.g., a toothache, becomes greater if you focus attention on it. If you, by contrast, are distracted, e.g., by looking TV, or by reading a thriller, then you can temporarily ‘forget’ the pain, i.e., you do not perceive it, though the tooth hasn’t got better, and the sensory neurons causing the pain keep firing. Probably, the excitations are not transferred to the cortical region in which pain becomes conscious. Soldiers reported that they, in combat, did not feel pain despite being seriously wounded.
Likewise, speech perception is influenced by attention: Speech is not understood, but only processed as noise if attention is distracted from listening (more about this in the next section). The switch point in the brain responsible for transferring excitations from the sensory periphery to the cortex is the thalamus. Its activity is controlled by many factors, but not least by the focus of attention (see, e.g., Portas et al., 1998; Robinson, 2015).
Even very simple actions like eating or drinking require an appropriate allocation of attention, i.e., of perceptual and processing capacity. For example, little children often choke over their food: Crumbs or liquid gets into the windpipe. With time, children learn to take care of no food being in the back part of the oral cavity when they breath in with open mouth. This behavior is automatized and does no longer require conscious attention. However, if attention is strongly distracted during eating or drinking, e.g., by a heated dispute, or by a thriller on TV, it may happen that one chokes over one’s food and must vigorously cough. Probably, there was not enough attention left for perceiving the content of the mouth. A further example is biting one’s own tongue – not figurative but actually – during eating: Normally, we automatically avoid this, but it can happen when attention is strongly distracted (and can be very painful). The examples show that even automatic behaviors need an appropriate allocation of attention – even if we are not aware of this fact.
On the ventral part of the somatosensory cortex, the self-perception of the orofacial region (jaw, lips, tongue, pharynx) and the self-perception of the intra-abdominal region are localized:
Therefore, I assume that the reduced connectivity of the ventral somatosensory cortex pertains to the self-perception of breathing. There is empirical evidence from Simonyan & Horwitz (2011) that conscious exhalation is represented on the ventral somatosensory cortex, namely in the lower region of the representation of mouth and pharynx (bottom arrow). Therefore, a reduced connectivity between the ventral somatosensory cortex and the motor / pre-motor area in stutterers, as was found in Cai et al. (2014b), may indicate that the self-perception of breathing is poorly involved in speech control.
However, also the perception of the abdominal region contributes to the self-perception of breathing. How do humans feel their own breathing? Diaphragmatic movements are imperceptible, but what one can feel is the varying pressure the diaphragm puts onto the abdomen. To feel this and to speak consciously only during exhalation often helps children who stutter at the onset of utterances. As it can he seen at the figure (upper arrow), the self-perception of trunk and, with that, of abdominal movements, is localized on the dorsal somatosensory cortex (Simonyan & Horwitz, 2011). In stutterers, as a group, the left dorsal somatosensory cortex was found to be significantly weaker connected with the supplementary motor area (SMA) and the primary motor cortex (Cai et al., 2014b). Also this finding suggests that the self-perception of breathing is poorly integrated in speech control. SMA is responsible for the control of volitional (self-initiated) movements, among them for the onset of speech.
One- or two-word utterances, that are common at the beginning of language acquisition, can be planned as a whole by the brain before articulation. By contrast, the formation of longer sentences requires incremental planning: Sentences are planned step by step during speech on the basis of what has already been said. Little children who start forming sentences, and who plan these sentences incrementally, may frequently need pauses and may fill them with word repetitions or prolonged vowels, i.e., they may produce many normal speech disfluencies. But the demands of planning are well distributed over the sentence in this way, thus they are lower at sentence onset. Therefore, the risk of stuttering because of a misallocation of attention at sentence onset might be lower for these children.
In an earlier version of this footnote, I used the term ‘holistic sentence planning’, but meanwhile I think this term is not appropriate. I don’t believe that young children consciously try to plan the entire sentence before they speak it out. The problem with the transition to sentence production may rather be that, in the pre-syntactic period, a child can immediately access from the message (a notice, a wish, a feeling) to an associated single word or short phrase. In sentence production, by contrast, the message (which often may appear as an entirety) must be converted into a temporally organized sequence of steps. For example, in the sentence “The black dog was barking in the yard when Tom came home”, all the reported things happen at the same time: The dog is black, is in the yard, is barking, and Tom comes home. It is one single event. The temporal order in which the words must be spoken cannot (or only in a minor degree) be derived from the content of the message.
That is, in the pre-syntactic period, the child simply focuses on the message and gets the needed word or phrase immediately from memory – but later, the child does not get the sentence (the words/phrases in the syntactic order) as long as he or she is focusing on the message only. Attention must be allocated in a new way now: Sufficient attention must be directed to the sequence of steps, such that the every next one follows from the preceding ones. In other words, attention cannot be focused on the future only (on what shall be told), but must be directed to the past as well, because the words already produced must be perceived and kept in memory.
Some empirical results suggest that young stuttering children may have difficulty with the transition to incremental sentence planning: Analyzing data that covered a range of age groups, Howell and Au-Yeung (1995) found that stuttering children initially used more simple and fewer complex syntactic structures, compared with normal fluent children; however, this difference disappeared with age. In a sentence structure priming task, Anderson and Conture (2004) found stuttering preschoolers to show slower speech reaction times in the absence of priming sentences and greater syntactic-priming effects than controls. The authors assume that stuttering children “may have difficulty rapidly, efficiently planning and/or retrieving sentence-structure units”.