In an earlier post (from November 2018; see here), I wrote about pre-speech auditory modulation (PSAM). Remember, in a series of EEG studies, Ludo Max and Ayoub Daliri recorded auditory evoked potentials in response to probe tones that were presented immediately before speech onset and, at the same position, in two non-speech control conditions. The probe tone was presented in only 40% of the trials, so that it was unexpected to participants.
Daliri and Max (2015) found that the amplitude of the N1 component of the potential (about 100 ms after the probe tone) was smaller before speaking than before the non-speech tasks. This difference in amplitude, i.e., the suppression of the N1 before speaking, is what Max and Daliri called PSAM. They interpreted PSAM as reflecting “neural processes involved in priming and selectively biasing the auditory system for its role in monitoring auditory feedback during speech production.” (Max & Daliri, 2019, p. 3075).
My interpretation goes a little further; I think the normally fluent speakers8 expected to hear something (at least unconsciously) when they were about to start speaking, but the stutterers, in the same situation, didn’t expect to hear anything. Moreover, I consider PSAM to be indirect evidence for my theoretical assumption that stuttering is caused by insufficient attention to the auditory channel (see also my first video on Youtube).
Unfortunately, Max and Daliri discarded their original interpretation of PSAM. If I understand correctly, the initial reason was the results of Wang, Ali, and Max (2024), who found that greater PSAM was associated with reduced sensitivity to formant frequency changes in auditory feedback. In my view, this doesn’t contradict but supports your original hypothesis.
Feedback-based control of physical features of speech, such as voice intensity, pitch, or timbre (acoustic color), is a subordinate function of speech monitoring. It is not wrong to speak a little louder or lower or in a slighty higher or deeper voice. Human voices and manners of speaking are extremely different, and there is a large tolerance for deviations; they don’t need to be corrected.
The main function of speech monitoring is to detect linguistically relevant errors, i.e., errors that require correction. Semantic errors need to be corrected to avoid misunderstanding, and phonological and grammatical errors need to be corrected to avoid giving the impression that I don’t even master my native language. The basis of this function is a quick phoneme categorization, and for this, a high sensitivity to formant differences is not helpful. The task is, e.g., to distinguish between /o/ and /u/, not between as many sound varieties of /u/ as possible. Here, not sensitivity for subtle differences but rather a coarse sieve or grid is helpful for quick phoneme identification.
Therefore, I think that reduced sensitivity to differences in pitch and timbre in favor of quick phoneme identification is part of the auditory system’s adaptation to its role in monitoring auditory feedback during speech. The system’s focus on quick phoneme identification reduces the sensitivity to differences in formant frequencies.
What would follow from this for the interpretation of the results obtained by Li, Daliri, Kin, and Max (2024)? They found that speakers with more PSAM produced formants that were already less variable at vowel onset. As I’ve said in the first video, I think that the system’s preparation for speech monitoring implies that sufficient attention is directed (1) to the auditory channel and (2) to auditory goals. And the more precisely auditory goals are aimed, the smaller the articulatory variability. As sport science found for dart throwing: the movement is most precise when attention is directed to the target, the sensory goal, not to the movement itself.
Thus, greater PSAM = increased attention to the auditory channel and to the auditory goals—these two are closely linked—may lead to more precise motor commands and smaller trial-to-trial variability. PSAM may reflect a process that optimizes both feedforward motor commands and, as soon as auditory feedback is available, monitoring and integration of auditory feedback.
Let’s take this as the basis for interpreting the results of Daliri, Honda, and Max (2025). Assume that normal speakers automatically pay sufficient attention to the auditory channel and to auditory goals when they start speaking—this is suggested by greater PSAM and by the results of Lazzari et al. (2024), who found that normal speakers were unable to ignore their auditory feedback (see also my comment). And assume that DAF, as it sounds odd, draws speakers’ attention even more to auditory feedback—already at speech onset if they know they will hear themselves speak with DAF. This unsettles their automatic articulation and increases articulatory variability.
By contrast, stutterers seem to pay little attention to the auditory channel and to auditory goals at speech onset (suggested by smaller or lacking PSAM). That’s why their articulatory variability is greater with natural auditory feedback than that of normal speakers. Again, DAF, as it sounds unfamiliar, draws their attention more to auditory feedback if they know they will hear themselves speak with a delay. This increases their attention to the auditory channel (greater, normalized PSAM) and probably also to the auditory goal (they anticipate the sounds they will hear delayed). This makes their feedforward commands more precise and reduces their articulatory variability at speech onset.
This description is consistent with the second hypothesis proposed by the authors, “suggesting that altered auditory feedback may cause AWS to use a different motor planning strategy (such as allocating more neural resources to planning).” But is is no longer necessary to assume that PWS exhibit higher error sensitivity for small errors and lower sensitivity for large errors, which is hard to imagine.
I thus think that all the results are coherently explicable we assume that PSAM reflects a shift of attention to the auditory channel, associated (1) with attention to (or a precise selection of) auditory goals, which improves feedforward control at speech onset, and (2) with the auditory system’s adjustment for speech monitoring.
The finding of sports science that an external focus of attention (compared to an internal focus, on the movements themselves) results in more effective performance has recently also influenced stuttering research. Eichorn, Pirutinsky, and Marton (2019) and Eichorn and Pirutinsky (2022) found reduced stuttering under dual task conditions. Bauerly and Mefferd (2023) found reduced lip movement variability when adults who stutter were speaking in a condition with external attentional focus, compared to a condition with an internal focus of attention.
In their new study, Kim Bauerly and Eric Jackson (2024) investigated the effects of an internal versus external attentional focus on the articulatory variability of adults who stutter at the sentence level. Again, the results suggest that an external focus of attention is more beneficial than an internal focus (on articulatory movements). These findings are important, as they call into question therapy methods in which stutterers are taught to focus their attention on the speech movements.
To direct participants’ attention to an external target, Bauerly and Jackson asked them to focus on a ball that moved in random directions on a screen. Likewise, in the other studies mentioned above, external targets were used merely to distract attention from any internal focus. The underlying idea is that speaking works better the more automatic it is, and that attention to speech movements (and their conscious control) may disrupt automaticity. That may be true, but has an external attentional focus really no other function than to distract from internal perception?
Wulf and Lewthwaite (2016) define an external focus of attention as “directing performers’ attention to the (environmental) effects of their movements” (p. 1401). In contrast, the random movement of a ball on a screen is not an effect of the participant’s speech movements but a task-irrelevant, merely distracting stimulus. Wulf and Prinz (2001) have discussed what the optimal focus of external attention is. They write:
“Given a sequence of movement effects that the performer could focus on […]—which of these effects should the performer focus on in order to optimize performance? The first principle is that the effect that the performer focuses on should be as remote as possible. The second principle, which appears to contradict the first principle, is that the effect should be related as closely as possible to the action that produced it. […] it is necessary that the movement effects and the motor commands that produced these effects can be associated” (p. 656)
What follows from this for the optimal attentional focus when speaking? Is attention to a listener’s response (suggested by Bauerly and Jackson) an optimal focus? Probably not, for it is not closely related to the speaker’s articulatory movements; it rather depends on the listener’s opinion, expectation, or interest. By contrast, the auditory feedback of speech is closely related to speech movements—and clearly distinguishable from them.
The sound of the voice and the words, audible for others but also for ourselves, is the task-relevant environmental effect of speaking. It is closely related to speech movements and provides information that the brain needs for speech control (e.g., Tourville, Cai, & Guenther, 2013). Moreover, focusing attention on auditory feedback—listening to one’s voice and one’s words—distracts from internally monitoring and controlling speech movements.
Auditory feedback has not yet been applied as a target for an external focus of attention, probably due to the widespread prejudice that listening to their speech may be harmful to stutterers. This prejudice has several sources: first, the idea that hearing one’s own stutter could exacerbate stuttering; second, the assumption that auditory feedback could be dysfunctional in some way in stutterers; and third, the observation that stuttering often disappears under auditory masking by loud noise.
For instance, the German medicine Sandow (1898) called the disorder “sensory echo stuttering” and suggested: “Either plug your ears with cotton wool, or speak lower! In both cases, the acoustic irritant will become weaker” (p. 67). In the late 1950s, the idea had come up that auditory feedback could be abnormal in stutterers, e.g., due to inter-aural phase disparity or interference between air-conducted and bone-conducted auditory feedback (Stromsta, 1959; 1972; Webster & Lubker, 1968).
Against this background, Van Riper (1973) wrote: “Our position is that some of the stutterer’s difficulties seem to originate in the auditory processing systems. We feel that if we can get him to concentrate upon proprioceptive feedback, we can bypass these difficulties. Accordingly, we use masking noise, DAF, and other methods for facilitating motor control through proprioception. We want the stutterer to stop listening to the gaps and abnormalities in his speech when they occur and when he expects them.” (p. 211)
There actually seems to be a problem with auditory processing in stutterers (see my list of studies). However, it seems to affect auditory gating (Kikuchi et al., 2011; Saltuklaroglu et al., 2017), that is, the suppression of redundant acoustic input and thereby auditory attention. Recently, Lazzari et al. (2024) found in a finger-tapping experiment that normal speakers automatically used auditory feedback to control the rhythm of their tapping even if they were instructed to ignore it and to focus on their movements. In the participants who stuttered, by contrast, the external focus on auditory feedback was not automatic; they had no difficulty ignoring it (see also my comment on the study).
It is thus possible that stutterers ignore auditory feedback also when speaking and instead adopt an internal attentional focus, with the result that auditory information is insufficiently involved in speech control. Fiorin et al. (2021) compared the effects of delayed auditory feedback, masking by noise, and amplified auditory feedback on stuttering. The reduction of stuttering was greatest in the latter condition; the difference was significant for both moderate and severe stutterers (see my comment on the study).
The results of Fiorin et al. (2021) can hardly be explained in another way than by saying that the amplified auditory feedback (through headphones) attracted the participants’ attention, the external focus on auditory feedback improved auditory-motor integration, which reduced stuttering (about the effects of delayed auditory feedback and masking noise on stuttering, see Chapter 3 in the main text). So we have good reasons to assume that auditory feedback is not harmful for stutterers. By contrast, a sufficient amount of attention must be directed on it to supply the brain with information needed for the control of fluent speech.
In the previous studies that aimed to find the best attentional focus for movement control in dart throwing, balancing, or golf, only visual targets for external attentional were tested. However, Wulf and Lewthwaite (2016) write: “Experimental evidence has amassed for the benefits of adopting an external focus on the intended movement effect” (p. 1396; my italics). What is the intended movement effect of speaking? To make one’s thoughts audible, so that others can hear and understand them. There may be further intended effects, e.g., to evoke a listener’s response, but the immediate and movement-dependent intended effect is an audible one. Therefore, this audible effect is also the most natural target for a speaker’s attention.
Lazzari et al. (2024) have shown that normal speakers automatically pay attention to auditory feedback during rhythmic finger-tapping, and we can assume that they do it even more when speaking. Stutterers did not show this automatic attentional behavior in finger-tapping, and we can assume that they tend to ignore their auditory feedback during speech as well. This could be the cause of stuttering. Future research should test this hypothesis and explicitly investigate the effect that attention to auditory feedback has on stuttering.