“One of the riddles of developmental stuttering is its onset. No other developmental communication disorder presents such a profile of ostensibly normal development, followed by a transition into a pattern of disordered production. […] most children who stutter apparently display an early profile of normal fluent expressive communication, only to develop typically between the ages of 30 to 36 months, the disfluencies characteristic of stuttering […] What possible changes or shifts in a child’s development […] might either overwhelm the synergy of the child’s linguistic, motor and/or monitoring systems, or evoke such changes in how speech is processed and produced? [...] Starting around 2 years of age, there is a transition from a lexically driven, asyntactic production system to the development of a qualitatively different, grammatically governed system capable of generating syntactic plans for speech execution […]” (Bernstein Ratner, 1997, pp. 113/114)
What is the role of auditory feedback in this change of a child’s speech development? Van Riper (1971) believed that,auditory feedback would become less important because the self-monitoring of speech shifted from the auditory channel to the proprioceptive, tactile, and kinesthetic feedback channel (read more). But if my assumption is true that speech monitoring requires to identify self-produced words (for predicting their correct sound sequences; see Section 1.5), then Van Riper’s hypothesis must be wrong. Usually, humans do not recognize their self-produced words by feeling their speech movements (except people with innate deafness, who are specifically trained). There is no reason to assume two different pathways of perceptive language processing in the brain, one for self-produced speech and one for the speech of others. Normally, we perceive, identify and understand our self-spoken words by hearing them, i.e., via the external auditory feedback (see Section 1.3). Therefore, speaking may rather develop as follows:
During the babbling period, hearing the self-produced sounds and syllables is very important because the child (or the child’s brain) must generate solid connections between articulatory movements and their acousitic results. In this way, the child ludically acquires motor programs for the sounds, sound conjunctions and simple syllables commonly occurring in the native language. After these basic speaking programs have been acquired, a new period begins: First single words are learned, that is, connections between semantic contents and speaking programs are established: beholding a cat and saying “cat!”.
That is the period in which children speak single words and short phrases, the period, in which stuttering not yet occurs, and Van Riper may be right with his assumption that self-monitoring via auditory feedback is less important in this period – after the motor programs for word production are sufficiently stable: The production of familiar words and phrases is widely feedforward-controlled now, and articulatory errors less often occur Thus, self-monitoring via auditory feedback for the purpose of error detection may play a minor role in this period of one- and two-word utterances, not least because the child gets feedback from other persons: Children realize by the listener’s response whether their short utterances have been understood or not., and if a word was wrong or incomplete, listeners will correct the error.
However, things change when children start forming simple sentences. For two reasons, auditory feedback now becomes important again: first, because of incremental sentence planning, and second, because immediate error detection and -repair is necessary in connected speech. Let us first consider the latter issue: Self-monitoring for the purpose of error detection becomes more important in the transition to connected speech because a speech error in a sentence must be detected and repaired immediately, otherwise the entire sentence mostly becomes nonsense or mistakable. Automatic self-monitoring ensures that the motor program of the next speech unit can be executed only if the preceding unit was assessed as correct and complete, according to Levelt's (1995) Main Interruption Rule. And, as described in the first chapter, this kind of self-monitoring can work only on the basis of auditory feedback.
The second reason for which auditory feedback becomes important again in the transition to connected speech is incremental sentence planning: It requires to keep the produced words in memory in order to properly complete the sentence, and the simplest way of getting the words actually said (and not words only planned) into working memory is to hear them (as already mentioned, I do not believe in copies of motor plans being the basis of self-monitoring – see Section 1.5). The role of auditory feedback in sentence formation becomes more clear when concidering the way how syntax is acqired:
Children basicly learn to speak by imitating the speech of others. Hence, speech production develops on the basis of speech comprehension. This is true also for sentence formation: In listening to sentences spoken by someone else the child realizes the meaning of syntactic structure, i.e., the meaning the position of a constituent in a sentence has for the proposition; in this way, syntax becomes anchored in the speech comprehension system – briefly: Syntactic structure is heard. By listening as well the child learns to keep the perceived words of a sentence in memory until the sentence is complete in order to understand the proposition. It is therefore plausible to assume that the child behaves in a similar way when producing first simple sentences: He/she controls the process by listening and keeps the perceived words in memory for self-monitoring, with the structure of the self-produced sentence being considered correct as long as it sounds like the structures produced by others. In this way, syntactic control and self-monitoring work without explicit knowledge of syntax, but depending on auditory feedback.
Van Riper obviously was wrong with his assumption that auditory feedback loses its role as the basis of self-monitoring in the course of speech development. Indeed, the phonological component of auditory feedback (necessary for articulatory control) becomes temporarily less important, but the lexical-semantic component of auditory feedback (necessary for incremental sentence planning and semantic self-monitoring and error detection) becomes more important. Interestingly, these two parts of auditory feedback are processed in the brain via two different pathways (more about that in Section 4.4).
The onset of childhood stuttering at the typical age of about three years might be related to the unfamiliar demands posed by sentence formation in the transition from one- or two-word utterances to connected speech (i.e., from the pre-syntactic to the syntactic system of speech production). Sentence formation in spontaneous speech requires a greater involvement of auditory feedback particularly in the higher level of speech control. With that, it requires a change in the allocation of attention (of perceptual and processing capacities) compared to the production of single words and short, invariant phrases in the preceding phase of speech development.
Some children seem to exaggerate this development: Natke et al. (2004b) found that stuttering children (2.1–5 years of age) produced longer vowel durations in long stressed syllables than normal fluent children. The authors conclude that children who stutter “not only learn to automate the production of short syllables, but that of long stressed syllables as well. Also long stressed syllables are produced in the absence of auditory control, whereas in non-stuttering persons, auditory control remains effective in these syllables.” (p. 3 in the PDF version). That is, speech control seems to loose itself too much from auditory feedback in young stuttering children.
Naturally, children are not aware of doing something qualitatively new when they form their first simple sentences, hence it is not surprising that some of them have difficulty with that change, such that they try to form sentences in the same ‘holistic’ way as they have formed single words and short phrases, namely by focusing on the intended message (see the footnote in Section 2.2). Especially when a child is enthusiastic or excited by what he or she wants to tell, all attention may be focused on the intended message, and the child would like to tell all at once. He/she possibly tries to form too long sequences of words as a whole. Sternberg et al. (1988) found in behavioral experiments that the maximum number of units in a speech motor sequence that can be kept in working memory was depending on unit size, e.g., 10 monosyllabic or 6 trisyllabic words. Maybe some children come at this limit, overchallange working mermory, and need thus much attention at the onset of sentences or clauses (the relation between working memory and attention is topic of Section 4.3).
There may develop an adverse interaction between a child’s tendency to persist in holistic speech planning and a growing imbalance in the allocation of attention to the detriment of auditory feedback: When then the feedback-based component of control does not or not sufficiently work, then feedforward control is overchallanged. Such imbalance especially at the onset of utterances, sentences, or clauses may then result in stuttering just at these positions, because the end of the preceding speech unit or breathing pause is not detected by the monitoring system. The reason why some children more than others have difficulty with the re-involvement of auditory feedback in the control of speech seems to be a deficient attention regulation and/or subtle deficits in central auditory processing – see Sections 3.3 and 4.4.
Van Riper wrote: “When the words are first learned, the auditory channel must be the one through which the essential information flows if the comparison of output with the interiorized model and the consequent reduction of error signals are to be accomplished. However, we need our ears for other important functions – for perceiving our own verbalized thoughts and the thoughts of others. It would be to burdensome if we always had to play our speech by ear – if we were eternally doomed to listen to each sound and syllable of each word to ensure its correctness. In our opinion, the human being, as he always does when overloaded, finds a shorter and easier way. In this case, he turns over the major responsibility for the monitoring of speech to another available information processing system – the kinesthetic-tactual-proprioceptive one – as soon as possible. This change, we believe, occurs as soon as the auditory system stops giving error signals; as soon as the motor sequences involved in word production seem to be able to be produced correctly without constant auditory scrutiny.” (p. 393)
Not only Van Riper but also other researchers, e.g., Adams (1974) and Perkins et al. (1976) assumed that, during speech development, the auditory feedback channel gradually loses influence on speech control, until, in adults, only tactile-proprioceptive feedback is used for purposes of control. “Speech dysfluencies, then, originate from auditory feedback not sufficiently being suppressed. This may lead to interferences and/or discoordinations between different feedback channels […]. However, experiments of Bauer et al. (1997) do not support interference models of stuttering. In these experiments, irregular tactile and proprioceptive feedback was artificially generated through mechanical disturbations of the jaw movement during speaking. These disturbations neither led to significant changes of speech relevant parameters of the acoustical speech signal, nor differentiated between stutterers and non-stutterers.” (Kalveram, 2001. p. 1 in the PDF version). Furthermore, audiophonatory coupling, i.e., the fact that, in normal fluent adults as well as in stutterers, the duration of long-stressed syllables is controlled via auditory feedback (Kalveram, 1983; Kalveram & Jäncke, 1989; Kalveram & Natke, 1998) shows that the auditory feedback channel is not at all switched off during speech.
However, audiophonatory coupling takes place on a low level of speech control – it concerns the physical features of speech, and the processing of auditory feedback on this level seems to be independent of attention to the auditory channel (see last Section). There is further no inconsistency between the assumption that stutterers do not sufficiently involve auditory feedback in speech control and the fact that audiophonatory coupling works in stutterers as well as in normal fluent speakers without difference: Audiophonatory coupling needs to detect only the onset a long-stressed syllable via auditory feedback.
By contrast, the feedback disruptions assumed to cause stuttering in the present theory occur at the back portion or at the end of speech units, particularly of ‘unimportant’ words and non-stressed syllables. Therefore, these feedback disruptions can hardly affect audiophonatory coupling – besides the fact that audiophonatory coupling works on a lower level of feedback-based speech control than the self-monitoring for the purpose of error detection, namely on the same level as the online-adjustment of volume, pitch, speech rate, and the distinctness of articulation – functions that all are usually not impaired in stutterers.