The self-monitoring of speech (henceforth often simply referred to as monitoring) is not only a mechanism allowing to check words, phrases, and sentences whether they are produced correctly and according to the intended message. In addition, automatic self-monitoring includes the online-adjustment of volume, pitch, speed and articulation. However, I do not assume that these online-adjustments play a role in the causation of stuttering; therefore, it is not the topic here (read more). I will only deal with the first mentioned part of self-monitoring that serves for the detection and repair of speech errors.
As already pointed out in Section 1.1, the self-monitoring responsible for error detection is also a mechanism ensuring that the next speaking program (speech motor program) can be executed only if the previous speech unit is correct and complete. Therefore, this kind of monitoring is important for fluency in a negative sense: The monitoring system interrupts speech flow immediately after detecting an error – more precisely: after detecting a mismatch between the expected correct sound sequence and the sound sequence actually produced and perceived. I deal with this issue extensively here because this part of self-monitoring will play a crucial role in the present theory of stuttering.
In his Perceptual Loop Theory, Levelt (1995) distinguishes between an external and an internal feedback loop (see figure). The external loop is to hear one’s own speech by ears while speaking. The internal loop is the perception of one’s own thoughts in silent verbal thinking, that is, the perception of inner speech. Further, via the internal loop, a planned formulation can be checked before it is spoken out loudly. This kind of self-monitoring simply is a special kind of verbal thinking (see the extensive discussion below); it takes place consciously and preventively, in contrast to the automatic, widely unconscious and retrospective monitoring via the external loop.
When I write of external and internal feedback loops, I always mean a feedback in the auditory sensory mode. That is clear in the case of the external feedback of speech – but also the internal feedback of speech is a feedback in the auditory sensory mode: Internal acoustic perceptions (or acoustic imaginations) are produced by the activation of acoustic word forms (see last section). In addition to auditory feedback, the tactile and kinesthetic feedback of speech movements is not unimportant for the control of articulation. Sometimes, these kinds of sensory feedback may play a role in the detection of phonological speech errors, particularly if voiceless consonants are involved. But I do not assume that tactile and kinesthetic feedback play any role in the causation of stuttering, and I only mention them for the sake of completeness.
Let us return to the two auditory feedback loops: A question crucial for the present theory of stuttering is: Do the external and the internal feedback loop work in parallel at the same time? Levelt (1995) does not explicitly address this question; maybe, it was not so important to him. However, from what he wrote (read more), I conclude he has not assumed that both feedback loops work in parallel. He has rather assumed the following:
A speaker’s attention can be directed to the external or to the internal feedback of his own speech.
External and internal feedback are parsed in the same normal language comprehension system.
The internal feedback loop does not work as long as the speaker hears his/her own speech normally by ears.
The three assumptions are basic for the present theory of stuttering, and I think they are consistent with Levelt’s model of speech processing. However, my position is different from Levelt’s model in two other points that are closely related with each other: (1) I do not assumee that formulation and articulation are separate in the brain – see the footnote in Section 1.1.. (2) I do not assume that inner speech is a phonetic plan for overt speech (read more). Why do I question Levelt’s widely accepted model? The reason is: If you want to explain stuttering in the framework of a linguistic model, then stuttering must be possible within this model. I think that is not the case with Levelt’s model of speech processing in its original formulation (see again the footnote in Section 1.1).
So I start from the following model: Both the feedback loops work alternately. The internal feedback loop is activated only if the external one does not work, that is, when one’s own speech is externally inaudible. This is the case during inner speech (silent verbal thinking), during ‘mouthing’ (silent speech movements), with complete auditory masking, or with hearing loss. Interestingly, stuttering mostly disappears in all these conditions. The theory proposed in the next chapter will also explain this remarkable phenomenon..Figure 2, above, shows the external and, below, the internal feedback loop. Since I do not assume that formulation and articulation are separate, both feedback loops have the same origin and terminal; only their ways are different.
A problem with Levelt’s (1995) model of speech processing is that he did not differentiate between spontaneous speech and consciously planned, internally pre-formulated speech, probably because he started from the assumption that all speech is planned, i.e., automatically pre-formulated in the Formulator. If we, however, do not assume the dichotomy of Formulator and Articulator, then we can define spontaneous speech as speaking without pre-articulatory monitoring (more about this issue in the next section). In spontaneous speech with unimpeded hearing, i.e., in normal everyday talking, self-monitoring is unconscious, automatic, and runs via the external loop only.
I modify Levelt’s perceptual loop model in the following way: (1) In normal spontaneous speech, formulation and articulation are not separate from each other; (2) the external and internal feedback loop do not work in parallel, but alternately; (3) in spontaneous speech and with normal hearing, only the external feedback loop works. These assumptions are important premises for the present theory of stuttering.
Usually, the self-monitoring of speech is regarded as a mechanism mainly serving to detect speech errors (slips of tongue). The correctness of a spoken word, however, logically includes its completeness. Therefore, self-monitoring might play an important role in speech acquisition, namely in learning to articulate every word completely up to its end before the next word is started, and not to clip the last sounds. This function of monitoring seems to be redundant in normal everyday speaking after a correct and complete articulation was automatized, However, as will be shown later, just this partial function of monitoring possibly plays a crucial role in the causation of stuttering.
The crucial differences between this part of monitoring and the monitoring that serves for the detection of speech errors are: (1) volume, pitch, speed, and articulation are online adjusted, but the repair of a speech error takes place offline, i.e.,speech flow must be interrupted. (2) The adjustment of volume, pitch, speed, and articulation does not require to understand the words spoken. This is also true for the self-monitoring of articulation as far as it only deals with the perspicuity of vowels and consonants, i.e., not to mumble the words. Error detection, by contrast, needs to understand the spoken words, and it needs to know what the correct word or the correct sound sequence of the word is.
I do not assume that the self-monitoring of volume, pitch, speed, or articulation plays a role in the causation of stuttering, even if some stutterers seem to have difficulty controlling volume and/or speed: They often tend to speak too rapid and/or too low. The latter may be a result of stuttering (low self-confidence), or/and the result of an acoustic oversensitivity (see Section 3.3). The first – rapid speaking – has often been suspected to cause stuttering, mainly for the reason that prolonged speech mostly reduces stuttering. This effect is probably caused by the so called audio-phonatory coupling: In prolonged speech, all syllable starts are online controlled on the basis of auditory feedback, which is, however, impossible with a normal speech rate (read more in Section 3.4).
A high speech rate, particularly an inability to make pauses may also be a result of stuttering in some cases: The stutterer has not the courage and the poise to keep listeners waiting. In Section 3.4, I will recommend to speak in a steady and sonorous voice and to make many pauses, which requires to habituate a conscious self-monitoring of speech rate, volume, and sound. This may help to develop more self-confidence in speaking, and it is in full harmony with the therapeutic chief aim resulting from the proposed theory of stuttering: improving the processing of auditory feedback.
Levelt (1995) described the relationship between both the feedback loops as follows:
“A speaker can attend to his own speech in just the same way as he can attend to the speech of others: the same devices for understanding language are involved. In Levelt 1983, I elaborated this proposal by supposing that there is a double “perceptual loop“ in the system – that a speaker can attend to his own [b]internal[/b] speech before it is uttered and can also attend to his self-produced [b]overt[/b] speech. In both cases, the speech is perceived and parsed by the normal language understanding system.” (p. 469)
Apparently. Levelt believed that whether the external or the internal feedback is effective depends on which of them the speaker attends to, and he did probably not believe that both feedback loops operate at the same time. A further evidence for this assumption is that Levelt refers to Lackner & Tuller (1979) who measured the time needed for error detection via the external and the internal feedback loop. They found that errors were detected more quickly via the internal loop. And how were those measurements carried out? In order to measure the time needed for internal feedback, subjects were masked by white noise. By contrast, the time needed for external feedback was measured with normal hearing of one’s own speech. That means, both, Lackner and Tuller (1979) as well as Levelt (1995) assumed that the internal feedback loop does not work when the speaker hears his own speech normally.
During inner speech, speaking programs run in the same way in the brain as during overt speech, with the only difference that the execution of most of the muscle movements is suppressed. The order of processing in inner speech is: 1. internal articulation controlled by speaking programs (just as in overt speech, but without phonation); 2. activation of the corresponding acoustic word forms (via direct links; however, these links were learned in childhood, and they seem to depend on consciousness and attention); 3. comprehension of one’s own verbal thoughts in the language comprehension system, just like the comprehension of the speech of someone else (see Fig. 2).
Motor activity during inner speech was proven in some studies: Thorson (1925) found that the tongue is often moved. In behavioral experiments, Aleman and van’t Wout (2004), Smith, Wilson, and Reisberg (1995), as well as Wilson (2001) demonstrated that motor control is involved in inner speech, thus we can distinguish between an ‘inner voice’, in which motor control is involved, and an ‘inner ear’ without motor involvement. Obviously, the distinction between inner voice and inner ear corresponds to that between speaking programs and acoustic word forms. When the inner voice is active, vocal folds are tensed depending on the imagined pitch (you can internally speak in an imagined higher or lower voice, and you can feel the tension in the larynx altering when you internally switch between high and low pitch). Despite the tensed vocal folds, no phonation comes about because the subglottal pressure is too low: Respiration shifts from basic respiration to speech respiration (short, quick inhalation, slow, curbed exhalation) in inner speech even more significantly than in overt speech (Conrad & Schönle, 1979).
Not least for that reason, inner speech is not a ‘phonetic plan’ for the articulation of overt speech. Inner speech (verbal thinking) is not the basis of overt speech – despite many people believe so. On the contrary, overt speech, including articulation, is the basis of inner speech: At first, a child must learn to speak overtly by talking with other persons. Only after the child has become able to form correct sentences, speaking can ‘shift inwards’ by the suppression of phonation. Normally, four-year-olds have no knowledge of inner speech ((Flavell et al., 1997)); their playing is often accompanied by a loud soliloquy.
Tian and Poeppel (2010) conducted an MEG study in which participants were asked to internally imagine the articulation of the syllable /da/. Immediately subsequent (i.e., temporally adjacent) to imagined articulation, the authors observed auditory cortex activation. They take these findings as evidence of an “internal forward model that predicts the anticipated auditory targets in covert speech production, via an auditory efference copy.” (p. 14): However, as I think it is no evidence of feedforward models or efference copies. As I have written above (in the first paragraph of the footnote), I rather assume the acoustic form (in this case, the acoustic form of the syllable /da/) is co-activated together with the motor program – not as a result of modeling, computation, or as a copy of the motor program, but simply from memory. We know from experience what it sounds like when we move our articulators in familiar ways. We have acquired this since the babbling period in early childhood, and, as mentioned above, it takes some years to develop the ability of inner speech, i.e., of the automatic co-activation of memorized acoustic forms together with imagined (or covertly executed) speech motor programs.