Accents and other diacritical marks in English
24 June 2004
Accurate and thorough information on the diacritical marks (or diacritics) used in English can be difficult to obtain. Schools in English-speaking countries tend to ignore the subject entirely; manuals of style may discuss diacritical marks only in the context of the typesetting of foreign languages. As a result, many people do not know how to use these signs or even how to interpret them when they see them.
This article attempts to fill the void with a comprehensive treatment on the use of diacritical marks in English.
The earliest alphabets had no diacritical apparatus. Like most scripts, they failed to make some significant phonetic distinctions. Greek, for example, did not notate the phoneme [h] or tonic accent; it also neglected vowel quantity in all but two pairs of vowels. Latin, whose script is ultimately of Greek origin, likewise did not distinguish long and short vowels orthographically. On the whole, the ambiguities that resulted from these deficiencies were only a minor inconvenience and did not usually result in confusion.
As the languages changed, some of these ambiguities became more acutely felt. Greek developed three accent marks—the acute, the circumflex, and the grave—for its two varieties of tonic accent (the grave indicated suppression of accent). Originally optional and uncommon, these and other marks came to be essential to the spelling of the language.
In Latin, various devices to indicate long vowels were sporadically employed, such as doubling the vowel letter or drawing the letter I especially tall. A mark called the apex, which resembled an acute accent (´), was sometimes written over a long vowel. These devices were all very uncommon, and none of them persisted.
Writing is conservative; speech, however, is not. Greek still uses exactly the same alphabet that it used two thousand years ago, and most of the languages that use the Latin script have added only the letters J (a variant of I) and U and W (both developed from V) to the alphabet used by the Romans. The tendency was thus to retain the existing set of letters even as phonetic developments created new demands on the writing system. For example, e in French came to represent both [e] and [ə], sounds that both happen to be very common at the ends of words. To resolve the many ambiguities, the acute accent (´) was borrowed from Greek for the sound [e]. Subsequently the other two Greek accent marks were borrowed into French for other purposes. Italian and Spanish used these accents to indicate stress, a purpose more akin to their use in Greek.
Scribes used a variety of abbreviations to save paper and effort. Especially common was the writing of one letter above another, often in streamlined form. An n written above another n became the Spanish tilde (~). A z added below a c to indicate its pronunciation became the cedilla (¸) (literally ‘little z’). In German, an e above a vowel developed into the umlaut (¨). These various characters, along with the accent marks, are collectively termed diacritical marks. (English speakers often loosely refer to all diacritical marks as “accent marks”; however, the latter term properly refers only to the three Greek accent marks, which did represent accent [vocal inflection], and their Latin descendants.)
A Latin-based writing system was developed for Old English. The earliest surviving records are from the last few years of the seventh century CE. Needing some extra letters for sounds unknown in Latin, the scribes created æ (ash, a ligature of a and e) and ð (edh, a modified d) and borrowed þ (thorn) and ƿ (wynn) from the extant runic script. As in Latin, vowel quantity was not indicated, although a few manuscripts with diacritical marks for quantity exist.
This functional script served the language well. After the Norman Conquest, however, French scribes squeezed it into a more Latinate mould, replacing æ with a and ƿ with w (itself a ligature, as the name “double u” suggests). Gradually ð fell out of use, as it represented the same sounds as þ. Eventually þ too was supplanted, by th.
English never adopted the Spanish custom of marking stress, although ambiguities do exist (éntrance, entránce). Nor did it adopt the French custom (which developed after English orthography became stable) of indicating vowel quality with diacritical marks, although the spelling of vowels in English is notoriously difficult, with many spellings for each sound and many sounds for each spelling. Arguably the relative lack of diacritical assistance has contributed to the notorious complexity of English orthography. Unlike most other European languages, English can be difficult for readers as well as for writers. The most educated and sophisticated anglophone will occasionally need to consult a dictionary for the pronunciation of a word. Without dredging up the most outlandish exotica, we can cite such tricky examples as anacoluthon, bergamot, chalazion, hendiadys, and puisne; and the average reader is as like as not to mispronounce cation, emesis, geodesy, patina, presage. In each instance, a single diacritical mark to denote stress or vowel quality would clear up any doubts. English, however, does not exploit such aids to the same extent as other languages.
Yet, as is explained below, English does make productive use of diacritical marks in a small number of cases. It also uses them, of course, in words borrowed from other languages, a subject to be discussed later.
Diacritical marks used in English
By far the most important of these is the acute accent (´), in particular on the letter e. The character é by itself occurs more often in English than all other foreign accented letters combined.
The great majority of the words written with the acute accent are from French, in which it is used only on e. A few common French loans containing é are communiqué, élan, fiancé(e), and résumé; many more are listed below.
The acute accent also appears in the names of various Irish institutions, such as Dáil Éireann, Fianna Fáil, and Sinn Féin. It is surprisingly uncommon in the many Spanish loan-words; adiós, jícama, olé, zócalo, and the well-known phrase Qué será será are among the few that require it. Portuguese has contributed auto-da-fé. Occasionally an acute accent may be seen in a word from Czech, Hungarian, Icelandic, Polish, or another language.
More and more, the single accent of modern Greek is transcribed with the acute, especially on maps.
The circumflex accent (^) appears in some words of French origin, such as château, crêpe, maître d’hôtel, and ragoût. Less often it may be found in a name from Portuguese, Welsh, or another language. It is optional, but preferred, in Malaŵi.
The grave accent (`) appears over a and e in certain words of French origin, such as brassière, bric-à-brac, cortège, crèche, crème, étagère, ménage à trois, pied-à-terre, and voilà, and also over various final vowels in words borrowed from Italian, as pietà and più.
The cedilla (¸), written under a c followed by a back vowel, usually indicates the sound [s] instead of the expected [k]. The most common English words written with a cedilla are curaçao, façade, garçon, limaçon, Provençal, and soupçon. Other languages use the cedilla with other letters, either in its canonical form or in a disjoint, comma-shaped variety; words with such characters, however, seldom appear in English outside a specialized context. (Turkish paşa, for example, is respelled as pasha in English).
The umlaut (¨) is used in several German loan-words, such as doppelgänger, fräulein, führer, gemütlichkeit, Gewürztraminer, kümmel, ländler, and Übermensch. It may also be found in words from Hungarian, Swedish, and other languages. Sometimes the substitution of a following e is acceptable: föhn or foehn, röntgen or roentgen. The umlaut’s typographic twin, the diæresis, is discussed at length below. In those foreign words (such as caïque and naïve) in which the diæresis appears, it is usually preserved. It also plays a productive rôle in English orthography.
The tilde (~) appears primarily over n, in words of Spanish origin: for example, cañaveral, cañón, doña, El Niño, jalapeño, madroña, mañana, ñandú, piña colada, piñata, piñón, señor, señora, señorita, vicuña. It is also used in a few Portuguese loan-words, such as sertão.
Various other diacritical marks, all much less common, occur primarily in unfamiliar words and in the names of people and places. Danish, Norwegian, and Swedish use the ring (˚) over a (ångström, smörgåsbord); Czech uses it over u. The háček (ˇ) appears in words and names from Czech and various other languages of Central and Eastern Europe. In addition to these and other diacritical marks, certain special letters, such as the slashed o (ø) of Danish and Norwegian (used in øre), are found occasionally in English context.
Diacritical marks proper to English
Of the diacritical marks proper to English, the most important is the diæresis (also spelled diaeresis or, occasionally, dieresis). This symbol (¨), ultimately borrowed from Greek, is a sign of separation. It marks a vowel that is to be pronounced on its own, not incorporated into a diphthong or (in rare instances) left mute: coöp (two syllables, a coöperative) is distinct from coop (one syllable, a cage for chickens), as is Noël (two syllables, Christmas) from Noel (one syllable, a man’s name). In a sequence of vowels, the diæresis is always written on the second: coördinate, preëmpt, deëmphasize, reënlist, microörganism, orthoëpy, caïque, daïs, naïve, noösphere, zoölogy, protozoön, oöcyte.
The diæresis in most words is optional; however, it should be retained in cases of ambiguity as well as in names such as Antinoös, Boötes, Danaë, Laërtes, Laocoön, and Pasiphaë, not to mention that of Verdi’s famous princess, Aïda. It has long been more common in the United States than in most other English-speaking countries, where a hyphen (co-operate) or no symbol at all (cooperate) may be preferred. Canadian usage favours the hyphen with any prefix; British usage is mixed (hyphen or nothing) but generally favours the hyphen. Today the diæresis is usually omitted in the US, but it was very much the norm well into the twentieth century and is still retained by some authors and, notably, the magazine The New Yorker.
Those who prefer a hyphen in co-operative are forced to choose between unco-operative and un-co-operative, both of which are hideous. Much better is uncoöperative.
Although usually used to break up a diphthong, the diæresis sometimes occurs over an isolated vowel that would otherwise be taken to be silent, as in Brontë. (As is discussed below, this facility could be exploited to much greater advantage.) In the two names de Staël and Saint-Saëns, the diæresis marks a vowel that is not pronounced; this unusual spelling is a remnant of an obsolete French use of the diæresis that conflicts with current practice in all languages.
In English, the diæresis occurs principally in the sequences aë, aï, eë, oë, oö. Perhaps out of loyalty to its Greek origins (alpha with diæresis never occurs), we do not usually write it over a, although it should be written in coäx (coaxial cable) to avoid confusion. It is seldom seen with u. It occurs over y in Artaÿctes and certain Dutch and French names.
Those who consistently use the œ ligature, to be discussed later, in words of Greek (or Latin) origin need not also write a diæresis in such words as gastroënteritis, hydroëncephaly, and noësis.
The diæresis is not ordinarily used with such suffixes as -ic (archaic). Nor should it be used in poem and derivatives. It is, however, helpful in the abbreviation poët. (for poetic(ally), as in dictionaries) to avoid confusion with the word poet.
Formerly it was written in the element aër-, as in aërate and aërial. Only those who retain this old-fashioned pronunciation or wish to suggest it in verse or dialogue should use the diæresis. Likewise, we now usually write Israel and pronounce it as two syllables, but it should be Israël in those hymns in which it is sung as three.
More extensive use of the diæresis might have prevented the change in the pronunciation of caffeine, cocaine, codeine, Esau, Haiti, Ukraine. Women named Irene who give their name its full three syllables might well prefer to spell it Irenë. The Chloës and Zoës of the world should diacritically resist the reduction of their names to monosyllables.
When division across lines breaks up the sequence of vowels, the diæresis should be omitted: pre-/eminence, not pre-/ëminence. This fine point of typography is, however, not always respected.
Writers with a playful spirit may enjoy seeking out unusual opportunities to employ the diæresis. For example, a sentence could be so contrived as to place unionized ‘formed into a (labour) union’ close to uniönized ‘not formed into ions’ as an excuse to use a diæresis that otherwise would not be written.
The grave accent
The only other diacritic that has a regular function in English orthography is the accent grave or grave accent (`; rhymes with Slav), which is used, principally in verse, to indicate that an otherwise non-syllabic -ed should be pronounced as a separate syllable:
Thy bosom is endearèd with all hearts
Which I by lacking have supposèd dead
—Shakespeare, Sonnet 31
An alternative is to write non-syllabic -ed as -’d. Whichever practice is chosen should be followed consistently throughout a text.
Originally this -ed ending was always syllabic. Today it is regularly syllabic only after t and d, as in listed and defended. In prose, those few participles in which it is syllabic can be written with the grave as an aid to the reader: my agèd mother, but an aged cheese. Similarly a learnèd scholar, a markèd improvement, accursèd life, hallowèd be Thy name.
No grave accent is used in such endings as -edness and -edly.
The acute accent
The acute accent (´) in English is used extensively in loan-words, especially those borrowed from French. (See below for more on this subject.)
In recent decades, however, it has developed a productive orthoëpic function: indicating that an e that would otherwise be silent is pronounced. In this rôle, it is obligatory in the word maté and usual in saké; it sometimes appears in molé. Note that it also serves to differentiate these words from mate, sake, and mole.
The acute accent in these words has no etymological basis. Maté comes from the Spanish mate (maté being another word, meaning ‘I killed’); saké would be sake in romanized Japanese. Another problem is that readers may be tempted to stress the syllable that bears the accent, which in these two cases does not bear the stress. The diæresis would be less confusing and more English; it is therefore recommended.
The German brand name Nestlé, which would otherwise look like the English word nestle, uses the acute accent in this way. So does Pokémon, a recent Japanese brand of games.
Occasionally a German or other name (Halle) is written with an acute accent (Hallé) in English context. The reader is thus alerted to pronounce the e but may not know what value to give it. Again, the diæresis would be a better choice, if we are to tamper with names at all.
This new productive use of the acute accent may become more widespread. If you see it in other words, please let me know.
Although they are not diacritical marks, ligatures deserve a brief mention. There are the strictly typographic ligatures of fi, fl, ff, ffi, ffl, ct, st, and the like, which do not concern us here.
English uses two orthographic ligatures, æ and œ. The first of these was a full-fledged letter of the alphabet, called ash, in Old English. Although it fell out of use early in the Middle English period, thanks to the interference of Norman scribes, it should still be retained in such Old English names as Ælfric and Æthelred. Œ is found in a few of the earliest Old English manuscripts, but it was very soon abandoned because the non-phonemic sounds ([œ] and [øː]) that it represented merged with those of e ([ɛ] and [eː]).
These two ligatures are optional in classical loan-words, such as archæology, pædiatrician, œstrogen, and pharmacopœia, in which they represent a diphthong. A delicate refinement is to use them only in words of Greek origin (such as the preceding), for in Latin these ligatures were not used until mediaeval times. Mediæval and fœtus are acceptable spellings of these Latin words, but mediaeval and foetus are slightly more proper.
Note that these ligatures are used only to represent diphthongs. Thus they cannot be used in aerial (aërial) and poem, in which the two vowels are in separate syllables.
Œ is required in most words borrowed from French, such as cri de cœur and hors d’œuvre. It is optional in manœuvre, for those who write the word with an o.
The macron (¯, pronounced somewhat like apron) and the breve (˘, pronounced like the first part of brevity) are used pedagogically to denote so-called long and short vowels, respectively.
The acute accent (´) and the breve are used in poesy to mark stressed and unstressed syllables, respectively.
Other signs have specialized uses that are at best tangential to orthography.
Diacritical marks used in loan-words
In addition to the authentically English diacritical marks described above, there are numerous foreign diacritical marks that are used in words borrowed into English. While some of these are optional, many are required.
Generally the diacritical marks in a loan-word must be retained
if their omission would result in ambiguity.
- divorcé, divorce
- exposé, expose
- résumé, resume
- lamé, lame
- piqué, pique
- rosé, rose
- curé, cure
- chargé, charge
- nacré, nacre
- Quiché, quiche
- fillér, filler
- colón, colon
- Jesús, Jesus
- pâté, pâte, pate
- mère, mere
- cañon (but properly cañón), canon
Diacritical marks that help the reader with pronunciation should be retained. The acute accent in communiqué, for example, shows that the word is not pronounced like unique.
As a rule, names should be written in the form preferred by their owners, with any diacritical marks or other characters that are required. Thus the composer is Antonín Dvořák, but his relative of typewriter-keyboard fame is August Dvorak. This also means that a person’s decision not to use diacritical marks in her name should be respected as much as any other spelling preference. We therefore write Celine Dion and George Frideric Handel, not Céline Dion or Georg Friedrich Händel.
Usually ß in German names is changed to ss. The numerous composers named Strauß are all Strauss in English. But Weierstraß frequently retains its ß in mathematical literature. Likewise, the letters thorn (Þþ) and edh (Ðð), found in Icelandic and Old English (among other languages), should generally be changed to th for lay readers, retained for specialists. Other Latin letters not used in English should be treated similarly.
Authors need not indulge the narcissistic whims of those whose names (Yahoo!) interfere so violently with punctuation or other conventions as to make writing and reading difficult. They should be aware, however, that final punctuation marks may have an important phonetic function in certain words, such as the Kalahari ethnonym !Xũ (in popular usage, !Kung), in which they must be retained.
The transliteration of names written in non-Latin scripts should take consistency, authenticity, convenience, the background of the audience, and any existing standards into account. Lay readers do not need linguistically precise representations of Arabic consonants or Chinese tones; they may be better served by a simple rendering that is readily pronounceable. On the other hand, this argument should not be taken to the barbarous extremes of yesteryear that led to Hindoostanee and the like. Established forms should generally be maintained, especially if more scientific transcriptions would be so far removed from the familiar ones as to confuse readers. Thus either Tchaikovsky or Tchaikowsky should be used, not Chajkovskij or Chaĭkovskiĭ; and Chiang Kai-Shek is almost always preferable to Jiang Jieshi.
Words felt to be foreign enough to deserve italics retain all their diacritical marks and other special characters.
The same goes for foreign words that are too obscure to be found in English dictionaries, whether they are italicized or not. Thus açaí, the name of a Brazilian plant, is so spelled even when it is not italicized.
A possible reform
Because a single final e after a consonant is ordinarily silent in English, problems arise when this important orthographic convention does not apply. In words borrowed from languages other than French, for example, a final e will usually be pronounced. Inflection can also give rise to ambiguities in both pronunciation and meaning: bases can be the plural of base or basis; ellipses can be the plural of ellipse or ellipsis; analyses can be the plural of analysis or a form of the verb analyse (except in the US, where the verb form is spelt analyzes).
As mentioned above, the acute accent has come to be used, sporadically and arbitrarily, in recent loan-words and brand names to indicate an e that, contrary to the reader’s expectations, is not silent. Unfortunately, this new convention conflicts with the usual value of é in English. Readers expect é to be stressed, if it is in the final syllable of the word, and to be pronounced like the vowel in say. Thus the accent is deceptive in words such as maté and molé, in which the stress is not final. And it cannot be used in the many words of Greek origin whose pronunciation is not evident to the uninitiated reader.
There is, however, a ready-made solution that is consistent with the English orthographic tradition: the diæresis. It can legitimately be used both in new loan-words and in words of long standing. If it came into common use to signal that a final e in a polysyllabic word is not mute, the correct pronunciation of apocopë, epitomë, and psychë would be quite evident, and the only possible hesitation in menarchë would be over the location of the stress. A positive distinction could be made between bases and basës. If this convention became established, the lack of a diæresis in hoplites, Masoretes, and Thebes would imply that the e in the final syllable is silent.
As it happens, few other languages employ the character ë, and those that do, notably Dutch and French, tend to use it in the same way as English. (Albanian is the most important exception: it uses ë for a schwa-like vowel. Thus Durrës and Tiranë might be mispronounced, although perhaps no more badly than they are today, if the use of ë proposed here became popular.) This fact makes ë much more suitable for use in loan-words than é, which has strong implications of stress and vowel quality that are not necessarily appropriate. Indeed, in the words listed above under “The acute accent”, the e to which the acute accent has been added differs from the expected pronunciation of é in stress, quality, or both. Since some of those words come from Spanish or another language that makes productive use of é in its orthography, the use of the acute accent in English is at best misleading. By contrast, ë can be used in a wide range of words with significant benefit to the reader and with little risk of confusion.
Whether this simple proposal will be adopted remains to be seen, but readers may wish to consider the merits of ë when exotic or difficult words occur in their own writing.