Lecture 9: the Sounds of Language

Producing and recognizing the sounds of language are highly complex motor and perceptual skills that involve considerable mental processing. Speech does not consist of a discrete series of events that correspond to the letters and words of written language; sounds are not like beads on a necklace.

Phonemes

The number of individual speech sounds you can produce is not large; the estimated number of phonemes, or sound units that form a meaningful distinction, in English is 40-odd. The number is fuzzy because it depends on accent. Very few English speakers have fewer than 40 phonemes, or more than 45. With this small number of sounds that are meaningless in isolation we can form a very large number of meaningful words; the linguist Charles Hockett termed this feature of human languages duality of patterning. Duality is the most economical way to describe the difference between human languages and animal communication systems.

Speech is an overlaid function: it depends on breathing. In normal quiet breathing 40% of the time is spent inhaling and 60% exhaling; but, in speaking only 20% of the time is spent inhaling. As you vocalize a stream of air is forced through a series of chambers: the throat, various parts of the mouth, and sometimes the nose act as resonators. The size, and thus the resonant properties, of the various chambers are altered by the movements of the tongue, lips and teeth.

What is the difference between a vowel and a consonant?

The major division in speech sounds is between the way in which vowels and consonants are produced. In forming the vowels the flow of air is not stopped anywhere along its path by the tongue, lips, etc.; in forming consonant sounds the smooth air flow through the mouth is obstructed in some way.

How many vowels are there in English?

The different types of vowels are classified by how far forward and how high the tongue is in the mouth. 'Pure' vowels are stationary: the tongue keeps more or less the same position however long they are said. In contrast, in forming dipthongs the tongue is moved from one vowel position to another while the vowel is being produced, for example, in the words beer and bear the tongue moves from the front to a more central position. English has 11 or 12 pure vowels (depending on whether the vowels in bird and asleep (schwa) are pronounced differently) and 7 or 8 diphthongs (depending on whether poor and paw are pronounced with the same vowel sound).

Consonants

Consonants are distinguished by the place where the air flow is obstructed, the nature of the stoppage, and the presence or absence of voicing. Voicing is a result of the vibration of the vocal chords, these are flaps in the larynx which open and close rapidly during speech to let out puffs of air. For "stop" or "plosive" consonants like /b/ and /p/ the air is stopped and then explodes out abruptly from behind the obstruction. The "fricative" consonants, such as /v/, /f/, /s/ and the beginnings of this and thick, are formed by letting the air escape through a narrow gap rather than blocking it completely. these are harder sounds for young children to form, hence the common "pistgetti" for "spaghetti" where the fricative /s/ is replaced by the plosive /p/. To form "nasal" consonants the mouth is blocked and the flexible soft palate at the back of the mouth is lowered to force air out through the nose; /m/ and /n/ are formed this way.

How do listeners categorize the utterances they hear?

The physical properties of sounds that are perceived as the same by listeners can vary quite widely. Think about the difference between a young girl and an adult male saying a word. Different acoustical events are categorized by listeners, and the way that this happens varies with the language spoken. George Miller provides the following example: think of the words nut, not, neat, and newt. All four words begin and end in the same way, the only differences are in the vowels. If a speaker could not articulate these differences or a listener could not discriminate them the sentence, "The nut's not neat, Newt" would become, "The nut's nut nut, nut." Speakers of English hear the differences so clearly, yet there are only small differences in tongue position for each word.

A phoneme is the smallest unit of sound that allows us to distingush between two words. For example, the p in "pan" and the "b" in ban differ in the onset of voicing (vibration of the vocal chords); whether you say "ba" or "pa" depends on when your vocal chords start vibrating. For both sounds the flow of air is stopped momentarily and pressure builds up behind the lips (they are both labial sounds). For /ba/ the voice onset time coincides with the release of the air whereas for /pa/ the voice onset lags behind the release of air. Using a speech synthesizer the voice onset time can be systematically varied across a wide range. when prsented with these synthesized sounds people will generally categorize the sound as /pa/ when the voicing lags by more than 20ms. Thus we carve up the continuum of sound into discrete catgegories that form the phonemes of our native language.

It is important to note that these phonemic categorizations do vary across languages. A distinctive feature in one language is not necessarily marked in others. The dental fricatives in this and third are found in comparatively few languages: 18% for the voiced and 21% for the unvoiced. Hence these dental fricative sounds pose a problem for most students of English as a foreign language. In English aspiration is not a distinctive feature. The two labial plosives beginning and ending the word "peep" differ in that the initial consonant is accompanied by a small outflow of air. The two different acoustic events are heard as identical by English speakers, but in Chinese aspiration is distinctive; aspirated and unaspirated forms are heard as different words. A well known example of the differences in phonemic categorization between languages is given by Pinker when he describes a Japanese ambassador talking of Clinton's erection. The /l/ sound of English is a "lateral", made by keeping the tongue in contact with one side of the teeth ridge only. It is the only case in English where the left/right dimension matters for speech sounds. Japanese has no laterals, hence the /r/ and /l/ confusion.

Do infants use the same phonemic categories?

In the early 70s Eimas and his associates published a study of phonemic categorization by 1 to 4 month olds. They used the habituation paradigm: babies get bored looking at or listening to the same thing, they cease to respond to a continuous stimulus. Eimas et al exploited this by habituating babies to the syllable pa then presented them with either a physically different sound that was perceived by adults as "pa" or with a sound that differed from the first by the same physical dimensions but which was perceived by adults as "ba". The babies would regain interest in the "ba" sound (as measured by an increased rate of sucking) but their interest was not awakened by the different "pa". This same technique was later used extensively to show that newborns are capable of making the most subtle distinctions that serve to locate contrasts in languages. The question that arises next is whether newborns are capable of making all the speech sound distinctions used throughout the world. In fact, the capacity of month-old infants to distinguish minimal contrasts is remarkable.

At what age do babies start disregarding certain contrasts?

At two months babies easily distinguish /r/ and /l/, whereas Japanese adults have a great deal of difficulty doing so. Learning a language involves a partial loss, the forgetting of contrasts irrelevant to speaking in the mother tongue. By 10 to 12 months infants no longer discriminate contrasts not used in their native language.

What does this mean for the learning of a second language?

Learners of a second language begin by perceiving the sounds of the second language according to their native langauge categories and then gradually shift their perceptual boundaries. The phonemic categorization system of the native language determines how the world is perceptually carved up. To return to the pa/ba example, an adult who has learned to use one set of voice-onset times finds it difficult to produce another set when he or she tries to learn another language. Some researchers date this lack of adaptability to puberty, and speculate that hormonal changes are responsible for creating a critical period for acquisiton of the phonemic character of a language. This idea is reinforced by the tenacity of the childhood accents of foreigners such as myself.

Rules for Forming Syllables

Which of the following ten nonsense words form permissible English words?
1.ptak2.thole3. hlad4.plast5.sram
6.mgla7. vlas8.flitch9.dnom10.rtut

Rules for Forming Plurals

noselipchin
/iz//s//z/
flitchplastthole
If a word ends in a phoneme that is dental-alveolar and fricative (formed by raising the tongue blade and directing the airstream at the upper teeth) add /iz/ if the preceding stipulation does not apply and the word ends in a phoneme that is unvoiced, add /s/; otherwise, add /z/.

Sources:

  • Miller, G. (1991) The Science of Words, Scientific American Library, Freeman: New York.
  • Cook, V. (1997) Inside Language, St. Martin's Press: New York.
  • Mehler, J. & DuPoux, E. (1990) What Infants Know, Blackwell:New York.
  • Crystal, D. (1997) The Cambridge Encyclopedia of Language, 2nd edition, Cambridge University Press:New York.
  • Trask, R.L. (1995) Language: The Basics, Routledge: New York