The production of speech and the understanding of language may be sustained for several hours without any interruptions longer than a few minutes. Within one minute of discourse as many as 10 to 15 thousand neuromuscular events occur. The facility for production or reproduction of this multitude of overlapping and closely timed activities cannot possibly have been acquired by a simple rote-learning procedure or by any other direct “stamping in” method (Miller, Galanter. and Pribram, 1960). There must be some organizing principle that underlies the perception of speech and language as well as the intricacies of timing and ordering during production. What is that general, organizing principle? Lashley, perceptive also in this respect, proposed a rhythmic phenomenon as an explanatory construct. He was, however, vague and had no empirical evidence to back up this working hypothesis. A similar solution suggests itself for reasons quiet unsuspected by Lashley.


The sequence of speech sounds that constitute a string of words is a sound pattern somewhat analogous to a mosaic; the latter is put together stone after stone, yet the picture as a whole must have come into being in the artist’s mind before he began to lay down the pieces. In the progress of his work, he may put down three contiguous stones, each of which may in the end contribute to the same or to unrelated pictorial units. When we talk about visual patterns we consider only spatial dimensions, disregarding the dimension of time. Under most circumstances, time does seem to be irrelevant. Yet, physiological processes do have a temporal dimension, and even in the process of seeing, which strikes us as taking place instantaneously, time plays a role. The identification of such simple figures as triangles and circles requires time and consequently, requires temporal integration in the central nervous system.


In sound patterns, the entire configuration is in the realm of time and the problem then becomes: How does pattern or order in time differ from randomness or disorder in time. What is it that enables us to recognize and to reproduce a time pattern – any time pattern?


In music it is well-known that it is possible to recognize melodies by finger tapping, or even head nodding; and after listening to ten seconds or more of finger tapping, we can discriminate fairly well between random tapping and patterned tapping. Behaviorally, patterned tapping can be memorized and is recognizable and reproducible; random tapping is not. The essential nature of a time pattern is an underlying pulse or beat. In the extreme case of order in time, the simplest pattern is the unadulterated pulse such as the tick-tock of a metronome. We can complicate this simple pattern by temporal modulation such as skipping a beat regularly or introducing additional taps between beats where those extra taps must occur at fractional periods of the time unit. The underlying pulse is the carrier on which the rhythmic pattern can be “fastened.” It is its indispensable ingredient in much the same way as a figure may only be recognized against a ground. Notice that of all the information contained in a melody none is as indispensable for recognition as that concerning time. We may eliminate variations in pitch, loudness, or timbre and still recognize the melody, but if we destroy the internal temporal relationships without distorting the other variables, the melody becomes at once unrecognizable (cf. also Sachs.1953).


We have been talking here primarily of sound patterns. But our observations are actually applicable to all temporal patterns whether they are perceived through our ears, our eyes, our skin, or our sense of proprioception. In any medium, temporal pattern means a carrier pulse with modulations. Let us call the carrier pulse simply the rhythm.

If speech is a patterned temporal phenomenon, and if such phenomena are based on underlying rhythms, is articulation rhythmic?


(1) The Rhythm Nature of Articulation

A rhythm may be marked by equidistant pulses or by simple oscillations. A special
case of the latter is the periodic alternation between two states, say sleep and wakefulness, or facilitation and inhibition. The rhythm underlying speech seems to be based on rhythmic alternations between states although we cannot yet say what the origin or nature of these states might be. Because of our ignorance of the true physiological basis of these states, we must be content with thinking of them as purely theoretical constructs. Let us say they are states of initiation and execution of motor patterns (or cycles of activation and inhibition).


An analogy might illustrate the point. Take once more the drummimg of our fingers upon a table top. We can make the taps follow one another in rapid succession and with practice we may learn to tap without introduction a longer pause or louder tap at the time we start with the small finger again. Nevertheless, the tapping is organized in terms of a single motor pattern of the hand; for every four taps we have to repeat that pattern. If we tap simultaneously with both hands we have two such pattern going on at the same time, and the individual taps from the right and the left hand will intermingle in their temporal sequence. Ordinarily we can hear the tapping rhythm (that is, the grouping by fours), but in some cases the rhythm may no longer be recognizable. However, even in these cases, there are statistical means by which the underlying rhythmicity could be demonstrated. Each motor pattern is somewhat similar to fundamental motor patterns that underlie speech, probably corresponding to syllables.


Let us hypothesize that there is a basic periodicity of approximately six cycles per second.* Since we are not dealing with a mechanical device, we must expect some variations within and among individuals. It is, therefore, safer to hypothesize a rate of 6 +/- 1 per second. Thus, one-sixth of a second is taken to be a time unit in the programming of motor-speech patterns. With this assumption, a great variety of phenomena may be explained. The basic facts pertaining to them are well-established, but so far individual, unrelated explanations have been offered for each of them. A single hypothesis concerning an underlying rhythm brings them all together. We shall discuss each under a separate heading.


(a) Delayed Feedback. Normally we hear our voice at practically the same time as it is produced. Speech may be seriously disrupted if a delay is artificially introduced between the time we actually speak and the time the corresponding sounds reach our ears. This phenomenon is often called the Lee effect. J. W. Black (1951) studied the relationship between the length of the delay and the degree of speech interference. He measured the latter in terms of the time it took subjects to read certain test material. In Fig.317 this variable is plotted as a function of the delay time. It appears that there is a critical delay that maximizes interference. The greatest interference occurs with a delay of about 180 msec(2/11 sec.). The curve found by Black is more or less what we would expect from the rhythm hypothesis.
* I am indebted to A. W. F. Huggins for valuable regarding the following paragraphs.

