In the first two sections of this chapter, I reported on some of the empirical findings regarding structural deficits in white matter integrity in stuttering adults and children – namely in the superior longitudinal fasciculus (SLF) and in the extreme capsule fiber system (ECFS). Here, I will present a rough model of the left-hemispheric speech/language network, in which both the fiber tracts have different functions in speech control. By means of the model and the assumptions about working memory from the last section, the theory proposed in Chapter 2 can be specified.
The cortical neuronal network of speech control on the language-dominant brain hemisphere can roughly be subdivided into two parts: a temporo-parietal part mainly responsible for speech perception and -comprehension, and a frontal part mainly responsible for speech production – however, both parts are involved in both, production and comprehension. They are interconnected by several fiber tracts (bundles of axons; see Section 4.1) that can again be roughly subdivided into a dorsal and a ventral stream. The dorsal stream is mainly the SLF including arcuate fasciculus, the ventral stream runs mainly via the fibers of the ECFS, i.e., inferior longitudinal fasciculus, inferior-frontal occipital fasciculus, and uncinate fasciculus (Ben Shalom & Poeppel, 2008; Duffau, Herbet, & Moritz-Grasser, 2013;,Friederici, 2012, Hickok & Poeppel, 2007, Lau, Phillips, & Poeppel, 2008; Rauschecker & Scott, 2009; Rijntjes et al., 2012, Saur et al., 2008). Even if the models proposed by different researchers differ in details, the rough sketch in Fig. 16 is sufficient for our purpose.
Figure 16: Dorsal and ventral stream in the speech/language nerwork. The figure is similar to Fig. 14, I only included middle temporal gyrus and angular gyrus to suggest that the streams are rather branched into the sensory association areas
The dorsal stream interconnects posterior temporal and inferior parietal regions with motor and premotor regions and Broca’s area; the ventral stream interconnects anterior temporal regions with Broca’s area and BA47. The different functions of dorsal and ventral stream result from the functions of the cortical areas they interconnect. It is assumed that, via the dorsal stream, phonological information is processed, i.e., perceived phonemes, their order, perhaps acoustic word forms. Via the ventral stream, presumably, lexical and semantic information is processed, i.e., order and meaning of words and phrases. As set out in the last section, we should not conceive this processing as data transmission, but in the meaning of automatic attention control and working memory: Both steams involve perceptual information in the planning and control of speech, but also, in the opposite direction, someone’s implicit knowledge of language, e.g., of syntax and grammar, in the comprehension of speech.
Let us, first, consider the ventral stream. It runs mainly via the tracts of the ECFS, in which structural deficits were found in stuttering preschoolers, even close to the onset of stuttering (see Section 4.2). However, Chow and Chang (2017) did not replicate these findings. In persistent stutterers, adolescents as well as adults, no group difference to normal fluent controls was found in the ECFS (see, e.g., Cai et al., 2014b; Chang et al., 2008; Kronfeld-Duenias et al., 2016). Therefore, I here confine myself to the dorsal stream and its role in normal speech and stuttering.
In Section 4.1, I came to the position that fibers of the SLF are involved in the processing of the auditory feedback of speech, and that their delayed maturation probably is the consequence of reduced activity over time: I assume the fibers are less activated because stutterers less involve auditory feedback in the control of speech. Simply said: Like a muscle that is poorly developed since it has rarely been used, so some nerve fibers of the left SLF are poorly developed since they have rarely been activated. On the other side, a deficient involvement of auditory feedback in speech control is assumed to be the main cause of stuttering after the present theory (see Chapter 2).
On closer consideration, however, there is a problem: In Section 2.1, I have assumed that stuttering is usually caused by temporary (short-term) disruptions of auditory feedback at the end or in the back portion of speech units (‘disruption’ means, feedback information is insufficiently processed). It was impossible to assume a permanent disruption of auditory feedback because, then, the self-monitoring of speech could not function, thus neither speech errors could be detected, nor invalid error signals – assumed to immediately trigger stuttering – could occur. Therefore, I assumed only short-term disruptions of auditory feedback, which would not or hardly interfere the self-monitoring of speech. However, the assumption of such short-tern disruptions is not consistent with the assumption of a long-term under-activation in the SLF being the cause of a delayed fiber maturation.
Short-term deactivations of fiber tracts would hardly result in significant structural deficits, as they were found in the left SLF in stutterers. The significantly reduced fractional anisotropy (FA) rather seems to suggest that those fibers have permanently been less activated, compared to fluent speakers. Further, under-activations in the auditory association areas, particularly in the left superior and middle temporal gurus during speech were found in stutterers compared to normal fluent controlss in several brain imaging (PET and fMRI) studies (see Table 1). Since these methods are based on alterations of blood flow, they are relatively sluggish and unfit for detecting short-term events, which supports the assumption of long-term under-activations.
The dual stream model of speech/language processing that was presented in the last section allows to solve the problem. In the model, it is assumed that phonological information is processed via the dorsal stream, and lexical/semantic information via the ventral stream. In stuttered speech, possibly, only the phonological part of auditory feedback is disrupted, but not the lexical/semantic part. The lexical/semantic part of auditory feedback is necessary and sufficient for incremental sentence planning, for the production of expectations of the correct sound sequences (see Section 1.5), and for the detection of speech errors (since almost all speech errors made by competent speakers are semantically relevant; see below). The question is: If the phonological part of auditory feedback via the dorsal stream was permanently disrupted – would then, except stuttering, no further impairments of speech to be expected? Below, I will address this question.
The dorsal stream interconnects posterior temporal and inferior parietal sensory association areas with inferior frontal areas of speech planning and -control (see last section). It runs via fiber tracts of the left SLF (read more) and plays an important role in the involvement of auditory feedback in speech control at several levels (read more), among them at the phonological level, that is, it provides information about the perceived speech sounds and their order – in listening to someone else, but also during speech production.
Important functions depending on phonological perception and, with that, on the dorsal stream are the ability to imitate speech sounds, non-lexical sound sequences, and syllables (during the babbling phase) and the ability to repeat and to learn new words. In the latter case, the perceived speech sounds and their order is the pattern for one’s own speech movements. Therefore, phonological information provided via the dorsal stream is crucial for the acquisition of the native language as well as of a second language – however, this fact is of low importance in everyday talking.
A further function of the dorsal stream is phonological monitoring, that is, to compare the sound sequence produced and fed back with the expected correct sound sequence of a word (see Section 1.3). The expectation of the correct sound sequence is generated after the word has been recognized, which happens on the basis of the initial phonemes (see Section 1.5). The recognition of words is localized more on the anterior part of the temporal cortex; it proceeds from the posterior to the anterior part of the left superior and middle temporal gyrus (Specht and Reul, 2003); hence, the dorsal stream seems not to be involved in word recognition. The lexical and semantic monitoring (whether the intended words were spoken and the intended meaning was expressed) might be a function of the ventral but not of the dorsal stream. In the following figure, the role of the dorsal stream is exemplified by the repetition of an unfamiliar sound sequence, e.g., a nonword or an unknown word, and the associated self-monitoring.
Figure 17; Function of the dorsal stream during the repetition of an unfamiliar phoneme sequence, e.g., a nonword. A = from hearing to speaking (successive), B = speaking and self-monitoring (temporally overlapping).
When an unfamiliar sound sequence is repeated, the dorsal stream has two functions: First, to direct the speaker’s attention to the perception of the presented sound sequence, and, second, to keep the perceived sound sequence in memory (see Section 4.3) as the basis for the phonological self-monitoring. This monitoring – whether the sound sequence was repeated correctly or not – is the comparison between the auditory feedback and an expectation that, in this case, is the presented sound sequence kept in memory. I have not drawn a direct arrow from the perceived sequence to speech motor control, because that would appear like a data transmission. It is the person who consciously repeats the heard sounds or syllables in the perceived order by means of existing speaking programs (see Section 1.2). This behavior was learned in the early period of speech acquisition and seems to be highly automatized, nonetheless it is impossible without consciousness and attention (read more).
As already mentioned in Section 1.3, phonological monitoring, i.e., to check whether a word is correctly articulated includes to check whether it is articulated completely. Therefore, phonological monitoring might play an important role in speech acquisition, when a child learns to speak a word up to its end, before the next word follows. That means, phonological monitoring is the automatic monitoring of the sequencing process and is basic for fluent speech, as was explained in Section 1.3. By contrast, is plays a minor role in error detection, because most speech errors made by competent speakers are semantically relevant and were detected in the lexical/semantic part of self-monitoring, i.e., via the ventral stream (read more). Therefore, if the phonological monitoring did not work, it would rarely have consequences for the detection of usual speech errors.
As we have seen above, the dorsal stream is important for speech acquisition, but what is its role in everyday talking? Via the dorsal stream, attention is directed to the sensory feedback of speech, mainly to the auditory feedback, and the sound sequences fed back are held in working memory (Fig. 18). A perceived sound sequence must be held in memory until the word is recognized, an expectation of the correct sound sequence of the word is generated, and this expectation is compared with the sequence produced (phonological monitoring). Word recognition may take less or more time depending on how many initial phonemes are needed for identifying a word distinctly in the given context (after the Analysis-by-Synthesis Model), therefore, the sound sequences perceived via auditory feedback must be held active in memory as long as necessary.
If the above said is correct, the function of the dorsal stream during normal everyday talking is only to direct attention to sensory, mainly auditory feedback so that self-produced phoneme sequences are sufficiently perceived and kept in memory for word recognition and phonological self-monitoring. However, what does happen when the speaker’s attention is too much distracted from the auditory channel? Since working memory is depending on attention (see Section 4.3), feedback information may not sufficiently be held active in memory. For example, only the initial portion of a speech unit fed back is recognized and held in memory. In this case, the word or phrase can still be recognized, if the properly processed initial part is long enough, and the correct sound sequence of the word or phrase can be predicted (see Section 2.5). But the end of the perceived sequence would be missing in the phonological monitoring, which would elicit an invalid error signal and, consequently, an interruption of speech flow as described in Section 2.1.
Since the initial portion of a familiar word is usually sufficient for recognizing it, the correct phoneme sequence can be predicted for the phonological monitoring (as mentioned in the last paragraph), and lexical and semantic monitoring as well as further sentence planning are not impaired by a distraction of attention from the auditory channel. Therefore, the only effect would be invalid error signals in the phonological monitoring, with stuttering being the consequence. So we can answer the above putted question as follows:
If the phonological part of auditory feedback via the dorsal stream was permanently disrupted because the speaker’s attention was too much distracted from the auditory channel, then stuttering, but no further impairments of speech would to be expected.
It is plausible to assume that a permanently reduced attention to the auditory channel during speech is associated with an under-activation of fibers in the SLF. However, a delay in fiber maturation / myelination could hardly be the consequence if these fibers were needed also for the comprehension of the speech of other persons – but are they needed? They are hardly needed, since word recognition, especially within a context, is possible on the basis of the initial portions of words, as described in Section 1.5. What the listener needs is mainly the lexical and semantic information processed via the ventral stream. More phonological information is needed only for the understanding of syntactically complicated sentences and for the comprehension of unfamiliar words – and, in fact, stutterers seem to have subtle difficulty in such situations: see Section 3.3 (anomalies in the processing of phrase structure violations) and below (nonword repetition). However, such high demands are rare in everyday talking.
Therefore, the dorsal stream seems not to be needed very often in speech comprehension, and we can assume that a permanently reduced attention to the auditory channel during speech is associated not only with stuttering, but also with a delayed maturation of fibers in the SLF, which would account for the reduced FA in this bundle in persistent stutterers. That means, the deficit in fiber maturation is not the cause of stuttering, but it may contribute to the persistence and influence the severity of stuttering: At speech onset, brain activity may shift to other, better developed and, therefore, more quickly starting neuronal networks – perhaps on the right hemisphere and perhaps such supporting a more volitional than the normal automatic control.
A role of structural deficits in the left SLF in persistent stuttering is suggested by the results of Chang et al. (2008): They compared the FA in this fiber tract in children who stuttered, in such who were recovered from stuttering, and in normal fluent controls who had never stuttered (all 9–12 years of age). They found deficits in the stuttering and in the recovered children, but in the recovered group, deficits were smaller and were, in part, compensated in the right SLF, where the FA, on average, was even greater than in the control group. If the left SLF supports phonological working memory, there should be a difference in this ability between stuttering and recovered individuals, and in fact, Spencer and Weber-Fox (2014) found that the children who recovered from stuttering, as a group, were better able to repeat nonwords than stuttering children and even better than normal fluent controls. These results suggest the left SLF, auditory attention, and phonological working memory to be important for the recovery from stuttering, which may have implications for therapy (read more).
There are some further empirical findings suggesting a relationship between structural deficits in the left SLF, on the one hand, and a deficient phonological processing and phonological working memory, on the other hand, in stuttering. It was shown that stutterers, as a group, compared to normal fluent controls, have deficits in phoneme discrimination (Neef et al., 2012; Pelczarski & Yaruss, 2014) and in the repetition of nonwords (Anderson, Wagovich, & Hall, 2006; Anderson & Wagovich,2010; Byrd, Vallely, Anderson, & Sussman, 2012; Pelczarski & Yaruss, 2016; Sasisekaran & Byrd, 2013). Especially the latter may indicate difficulty in phonological working memory. In the final series of a phoneme discrimination task, Neef et al. (2012) found a lower performance most pronounced in the stuttering group, which possibly indicated reduced attention. Deficits in auditory attention or in brain processes related to auditory attention were found in stutterers also in other studies (Jansson-Verkasalo et al., 2014; Kaganovich, Hampton-Wray, & Weber-Fox, 2010; Kikuchi et al., 2011).
Summary: I assume that the structural deficits in white matter tracts are not the cause of stuttering symptoms. They may rather indicate a delay in fiber maturation due to less activation – like a muscle that is weak because it is less used. The fibers are less active because of an imbalance in the automatic (involuntary and unconscious) allocation of attention during speech (or during motor action in general): too little attention is directed to sensory feedback. Consequently, feedback information is poorly processed, e.g., not completely kept in working memory or not completely accessed, which triggers invalid error signals in the monitoring system, and, by that, interruptions of speech flow.
Finally, a finding shall be discussed that is related to the question of why much more girls than boys recover from stuttering. The results obtained by Chang and Zhu (2013) provide a suggestion: Additionally to the structural integrity of white matter tracts, they also investigated the functional connectivity between several parts of the speech/language network in the brain of stuttering and non-stuttering children – that is, they investigated how different brain regions interact (are temporally correlated active) when the children were at rest (see also Chang, 2014). They found the functional connectivity between posterior superior temporal gyrus (pSTG) and inferior frontal gyrus (IFG) on the left hemisphere – that is, roughly said, between speech perception and -production – to be lower in the stuttering boys, as a group, than in the non-stuttering boys. Interestingly, a similar group difference was not found between the stuttering and non-stuttering girls. br>
Since pSTG and IFG are functionally connected via both the ventral and the dorsal stream, there seem to be two possible accounts for the difference between girls and boys: Either, the lack of a deficit in connectivity shows that most of the stuttering girls had already learned to involve the lexical-semantic feedback into sentence planning, but many of the stuttering boys had not. The other possibility: In some of the stuttering boys, the misallocation of attention had affected the processing of phonological feedback via the dorsal stream, thus those boys had a higher risk of persistence in stuttering. However, if the first possibility was true, why did the girls still stutter? Therefore, I think the second answer is the right one: The dorsal stream seems to be crucial regarding recovery versus chronification. Chang (2014) wrote: “... in the IFG-motor-STG circuit only stuttering boys but not stuttering girls showed decreased functional connectivity. Because more girls than boys grow out of stuttering naturally, it is possible that our stuttering girl group may have included those who will grow out of stuttering in the future. It may be that normalized patterns of connectivity in this circuit supports recovery...” (p. 76).
The exact anatomy of the SLF seems still to be a matter of debate. Makris et al. (2005) distinguished four partial tracts with at least two of them being involved in speech processing, namely SLF III and arcuate fasciculus. Catani, Jones, and ffytche (2005) divided the arcuate fasciculus into two partial tracts forming a direct and an indirect pathway between speech comprehension and speech control, with the indirect pathway running via the inferior parietal cortex. Because of these different anatomical subdivisions, the structural deficits found in the left SLF in stutterers were localized in SLF III in some studies and in arcuate fasciculus in other studies. This ambiguity, however, is irrelevant for our theme. The term ‘dorsal stream’ simply refers to those parts of the SLF that have to do with speech processing and -control. Interestingly, after Makris et al. (2005), SLF II, SLF III, and arcuate fasciculus have projections to BA46, an area on the dorsolateral prefrontal cortex, that is involved in attention and working memory.
We can distinguish between two functional areas of the dorsal stream during speech: The first one is the online control of phonatory intensity (volume), of pitch, of syllable length (speech rate), and of the distinctness of articulation on the basis of auditory as well as proprioceptive and tactile feedback. These sensorimotor pathways seem to be innate. I will not further dwell on this since I don’t assume deficits in this area, which are causal for stuttering: As long as they are fluent, the articulation of stutterers is mostly normal.
The second function of the dorsal stream is to involve phonological information in speech planning and offline-control (detection and repair of phonological and grammatical errors). Parts of the SLF interconnect the posterior superior temporal gyrus and sulcus (pSTG, pSTS) and the inferior parietal cortex with Brodmann areas 44 and 46. It is assumed that phonological processing, i.e., the recognition of speech sounds and sound sequences, is localized on pSTG and pSTS (see, e.g., Ben Shalom & Poeppel, 2008). The process from phoneme recognition via word recognition to semantic comprehension runs from the posterior to the anterior part of the superior and middle temporal lobe on the left hemisphere (Specht & Reul, 2003).
Slips of the tongue typically are mistaken words that are similar sounding at the first syllable, for example, “Bake my bike” instead of “Take my bike”. Purely phonological errors, i.e., errors that are not lexically or semantically relevant are rare, except with people who are uncertain in pronunciation, e.g., of loanwords, or in grammar. A competent English speaker will hardly say, for example, “horve” instead of “horse”, or “she go” instead of “she goes”. If a speaker does not correct and, apparently, not notice such errors, then rather because of incorrect or unclear expectations, i.e., poor knowledge of language, than because of a disrupted auditory feedback. Further, a purely phonological error of a competent speaker would require that a speaking program of a word, of a familiar phrase, or of grammar, i.e., a well established and widely unconscious behavioral routine (see Section 1.2) is disrupted, which is unlikely even if the speaker’s attention is distracted. By contrast, mistaking similar sounding words – not rarely concerning an important content word of an utterance – is common with abstraction.
Given that my assumption in Section 4.1 is true, and the structural deficits in the left SLF in stutterers are the consequence of a delayed fiber maturation/myelination because of less activation over months or years: Since fiber maturation can be promoted by regular frequent activation by means of appropriate training (Bengtsson et al., 2005; Keller & Just, 2009; Scholz et al., 2009), and since phonological working memory is an important function of the left SLF in speaking, repeating nonwords can be an appropriate training for the left SLF and can perhaps increase a child’s chance of recovery from stuttering. As far as I know, this possibility has not been tested until now.
Basically, this idea is not new, since speech shadowing is very similar to nonword repetition: In speech shadowing, the client repeats connected speech immediately after hearing it , thus he or she must listen to the ‘leader’ and speak at the same time. Since the time lag between hearing and repeating is very short, the client has hardly the time for semantically understanding the text; therefore, we can assume that it is mainly processed via the dorsal stream. During shadowing, stuttering usually disappears, and the method was successfully applied in treatment in the 60s and 70s, however, without a real understanding of why it worked (Cherry & Sayers,. 1956; Kelham & McHale, 1966; Kondas, 1967; MacLaren, 1960; Marland, 1957; read more in Section 3.1). Possibly, the shadowing of nonwords and of nonsense texts, which requires very attentive listening to the sound sequences, can be even more effective than usual speech shadowing, and it can easily be packaged in a funny game for children.
It may appear odd that no transfer of the phoneme sequence from perception to production is delineated in the figures . I really think that such data transfer through nerve fibers does not exist in the brain (see Section 4.3). If it existed, we would neither need consciousness nor attention for repeating a nonword. The brain could then work like a simple technical system consisting of microphone, recorder, and loudspeaker – and we do not assume that such a system has or needs consciousness.
Nonword repetition requires to consciously hear the phoneme sequence and to consciously keep it in mind until it is repeated. That’s a conscious reproduction of a given pattern. We do not yet understand the nature of consciousness and how it works (see this video by David Chalmers). But consciousness seems to enable a special kind of information transfer between different parts of the brain without wiring, namely in the following way: A pattern is presented in one part (the perception system) and is reproduced – re-created – by the other part (the motor system).
That’s a very basic human ability: You see something, e.g., a house, you take a pencil and, with a spark of knack, you sketch the house on a piece of paper. Do you assume that the shape of the house is anyway wired with your hand movements in your brain? Probably not. You see the house, and you know how to move the pencil for a horizontal line, for a vertical line, for a bow, etc. And the same is with nonword repetition: You hear the phoneme sequence, and you know how to move the articulators for producing the speech sounds.