Language Log ([syndicated profile] languagelog_feed) wrote2025-07-01 04:29 am

Engrish prus, part 2

Posted by Victor Mair

I haven't visited Engrish.com for several years, but it is always a source of great joy, so I thought I'd take a look today and see what turns up.  Here are six items of interest:


Photo courtesy of Brian Linek. Spotted in China.

The sign actually says:

Qǐng wù rùnèi
请勿入内
"Please do not enter"

wéizhě fákuǎn
违者罚款
"Violators will be fined"


Photo courtesy of Alice A. Found in Korea.

Sometime you just feel that way.


Photo courtesy of Alexi Smith. Spotted in Japan.

That's exactly what the Japanese says:  animarukoron アニマルコロン


Photo courtesy of Brad T. Spotted in Japan.

o tearai
お手洗
"restroom"

x

fēi jǐnjí qíngkuàng qǐng zhǐbù
非紧急情况请止步
"Please stop for non-emergency situations"

And I was going to stop with that one, but the next is too good to pass up, though the English by itself is entertaining enough that I won't explain what the Chinese really says.


Photo courtesy of W. Chew. Menu spotted in China.

There are scores more, but that's enough for today.  Phew!

Selected readings

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-30 11:44 pm

Bilingualism as a bonus for the brain

Posted by Victor Mair

Is being bilingual good for your brain?
Perhaps. Learning languages offers other, more concrete benefits
Economist (6/27/25)

Yes!  I won't mince words.  At least in my case, multilingualism has been very good for my brain.

In my rural Ohio high school, I took Latin and French, which is what were on offer.  I enjoyed both of them immensely, but they were almost strictly for reading and writing, so they didn't have much effect on the way my brain worked, at least not that I could discern.

In college, I added  Italian and German, both with reasonable spoken components, so my brain began to warm up.

Then I joined the Peace Corps and went to Nepal for two years.  My brain was on fire.  As I have described on Language Log (here), my group learned Nepali through total immersion and strictly on an oral-aural basis.  After three months of training in Missouri, I could already function in Nepali society without any difficulty.  When I got to my post (after a perilous trip trekking in), I had no one with whom to speak English, so I became essentially a native speaker of Nepali after one year in the country.  I had indeed opened up whole new areas of my brain.  That was really fun!  I even dreamed in Nepali.

After I came back from Nepal, I enrolled in a Sanskrit course, and that was all reading and writing, with literary appreciation a strong component.  At the same time, I took first-year Mandarin and loved it — the spoken part, that is, but had a strong aversion to learning characters.  I have repeatedly written about that dilemma in learning Mandarin on Language Log (see the refeferences below for some sample posts).  I also took Tibetan the same year; that was an eclectic "trip", because Tibetan was written in a  brahmic script,  had an archaic phonology reflected in its spelling, and had Sino-Tibetan roots.

More new rooms of my brain had been opened, but they weren't on fire the way they were in Nepal.

After a summer of Classical Chinese at Middlebury (you had to take a language pledge to attend, so my Mandarin language brain kept percolating).

Then off to London for Buddhist Studies and lots more Sanskrit, but no time for spoken language, which I yearned for.  So I went back to American and resumed my spoken Mandarin training.

A summer of simultaneous Hindi-Urdu (easy because of my knowledge of Nepali, which has a huge amount of imported Perso-Arabic vocabulary (same is true for Turkic Uyghur, which I learned by going to Eastern Central Asia starting in 1993).

I'll stop the language litany here, but it has never ended, though I will draw one personal conclusion before turning the rest of this post over to the Economist.  Namely, when I learn a language through listening and speaking, it always has a deeper, transformational impact than when I'm forced to learn it through writing.  The writing makes me feel that I am at a quintessential remove from the language itself.

Reams of papers have been published on the cognitive advantages of multilingualism. Beyond the conversational doors it can open, multilingualism is supposed to improve “executive function”, a loose concept that includes the ability to ignore distractions, plan complex tasks and update beliefs as new information arrives. Most striking, numerous studies have even shown that bilinguals undergo a later onset of dementia, perhaps of around four years, on average. But some of these studies have failed to replicate, leaving experts wondering whether the effect is real, and if so, what exactly it consists of.

…Ellen Bialystok of the York University in Canada, the godmother of the field [bilingualism and cognitive studies], has compared the cognitive protection bilingualism offers to that afforded by a slice of holey swiss cheese. Doing other things that are good for the brain, such as exercise, is akin to stacking the slices. Their holes occur in different places, and thus collectively offer greater cognitive protection. But all these studies take for granted the uncontroversial mental superpower that you get from language study: being able to talk to people you could not have otherwise. Even if you can’t pick your parents and be fluent from birth, that should be more than enough reason to give it a go.

"Holey swiss cheese" — nice metaphor!

 

Selected readings

[Thanks to Philip Taylor]

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-28 10:23 pm

Masochism: a bad rap from inception

Posted by Victor Mair

Long ago (half a century), I had occasion to translate the word "masochism" into Chinese.  At that stage, I wasn't even sure what "masochism" itself meant.  Supposedly it was "the madness of deriving pleasure from pain", I guessed especially sexual pleasure — something like that.

Wanting to give the most accurate possible translation into Chinese, I thought I should begin by investigating the etymology of the word, as is my bent.  So I pulled out my trusty 1960 Webster's New Collegiate Dictionary, my lexical vade mecum.  Here's what it had (has — I still keep it on my desk):

[After L. von Sacher-Masoch (1835-1895), Austrian novelist, who described it.]  Med. Abnormal sexual passion characterized by pleasure in being abused by one's associate; hence any pleasure in being abused or dominated.

My recollection is that, at the time, I couldn't readily find an English-Chinese dictionary that had the term "masochism" in it, so I may have made up this rendering for it myself, although I'm not absolutely certain that I did so:

zìnüèdài kuáng 自虐待狂 ("madness of self abuse") (129 ghits)

Be that as it may, there's no doubt that the most common translation of "masochism" in Chinese today is this:

shòunüèkuáng 受虐狂 ("madness of enduring / accepting / receiving abuse") (13,700.000 ghits)

It seems that nobody attempted to render "masochism" in such a way that it would reflect the fact that it derived from a person's surname.

Now, more than half a century later, wanting to see the latest understanding of the term, I looked it up in two current etymological reference works.

Wiktionary:

From German Masochismus, coined alongside Sadismus in 1886 by Richard von Krafft-Ebing in his book Psychopathia Sexualis. Named after Leopold von Sacher-Masoch, whose novel "Venus in Furs" explores a sadomasochistic relationship, +‎ -ism.

In more detail, Etymonline:

"sexual pleasure in being hurt or abused," 1892, from German Masochismus, coined 1883 by German neurologist Richard von Krafft-Ebing (1840-1902), from name of Leopold von Sacher-Masoch (1836-1895), Austrian utopian socialist novelist who enshrined his submissive sexuality in "Venus in Furs" (1869, German title "Venus im Pelz").

Sacher-Masoch's parents merged their name when they married; his maternal grandfather Dr. Franz Masoch (1763-1845) was born in Moldova Nouă in what is now Romania. The surname might be toponymic from a village that is now in northern Italy; or it might be a Germanized form of a Czech surname that amounts to a double-diminutive of given names with a prominent Ma- or -maš element (Tomaš, Mattej, etc.)

I wondered what Leopold von Sacher-Masoch himself thought of having this embarrassing disorder named after him.  As mentioned above,

The term masochism was coined in 1886 by the Austrian psychiatrist Richard Freiherr von Krafft-Ebing (1840–1902) in his book Psychopathia Sexualis:

…I feel justified in calling this sexual anomaly "Masochism", because the author Sacher-Masoch frequently made this perversion, which up to his time was quite unknown to the scientific world as such, the substratum of his writings. I followed thereby the scientific formation of the term "Daltonism", from Dalton, the discoverer of colour-blindness.
During recent years facts have been advanced which prove that Sacher-Masoch was not only the poet of Masochism, but that he himself was afflicted with the anomaly. Although these proofs were communicated to me without restriction, I refrain from giving them to the public. I refute the accusation that "I have coupled the name of a revered author with a perversion of the sexual instinct", which has been made against me by some admirers of the author and by some critics of my book. As a man, Sacher-Masoch cannot lose anything in the estimation of his cultured fellow-beings simply because he was afflicted with an anomaly of his sexual feelings. As an author, he suffered severe injury so far as the influence and intrinsic merit of his work is concerned, for so long and whenever he eliminated his perversion from his literary efforts he was a gifted writer, and as such would have achieved real greatness had he been actuated by normally sexual feelings. In this respect he is a remarkable example of the powerful influence exercised by the vita sexualis be it in the good or evil sense over the formation and direction of man's mind.

Sacher-Masoch was not pleased with Krafft-Ebing's assertions. Nevertheless, details of Masoch's private life were obscure until Aurora von Rümelin's memoirs, Meine Lebensbeichte (My Life Confession; 1906), were published in Berlin under the pseudonym Wanda v. Dunajew (the name of a leading character in his Venus in Furs). The following year, a French translation, Confession de ma vie (1907) by "Wanda von Sacher-Masoch", was printed in Paris by Mercure de France. An English translation of the French edition was published as The Confessions of Wanda von Sacher-Masoch (1991) by RE/Search Publications.

(Wikipedia)

Suppose your name was Plarich and somebody coined the term Plarichism as "deriving pleasure from eating insects" because you actually ate some bugs.  Wouldn't you be upset at having insect-eating named after you?  Wouldn't it be better / more scientific to call it entomophagy?  Mutatis mutandis, ditto for some Latinate version of "the madness of deriving sexual pleasure from pain", rather than "masochism".

I will not attempt to sort out the similarities and differences with sadism, with which masochism is often linked, thus sadomasochism, except to say that, although it looks as though it might have a more conventional etymology ("sad"), sadism too is named after an individual, the French libertine Marquis de Sade (1740–1814).

From French sadisme and German Sadismus. Named after the Marquis de Sade, famed for his libertine writings depicting the pleasure of inflicting pain to others. The word for "sadism" (sadisme) was coined or acknowledged in the 1834 posthumous reprint of French lexicographer Boiste's Dictionnaire universel de la langue française; it is reused along with "sadist" (sadique) in 1862 by French critic Sainte-Beuve in his commentary of Flaubert's novel Salammbô; it is reused (possibly independently) in 1886 by Austrian psychiatrist Krafft-Ebing in Psychopathia Sexualis which popularized it; it is directly reused in 1905 by Freud in Three Essays on the Theory of Sexuality which definitively established the word.

(Wiktionary)

Incidentally, here's a bit of trivia that may interest some Language Log readers:  "Sacher-Masoch is the great-great-uncle, through her Austrian-born mother Eva von Sacher-Masoch, Baroness Erisso, of the late English Rock star and film actress Marianne Faithfull. She passed away in January of 2025".  (source)

 

Selected readings

 

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-27 11:24 am

Computational phylogeny of Indo-European

Posted by Victor Mair

Alexei S. Kassian and George Starostin, "Do 'language trees with sampled ancestors' really support a 'hybrid model' for the origin of Indo-European? Thoughts on the most recent attempt at yet another IE phylogeny".  Humanities and Social Sciences Communications, 12, no. 682 (May 16, 2025).

Abstract

In this paper, we present a brief critical analysis of the data, methodology, and results of the most recent publication on the computational phylogeny of the Indo-European family (Heggarty et al. 2023), comparing them to previous efforts in this area carried out by (roughly) the same team of scholars (informally designated as the “New Zealand school”), as well as concurrent research by scholars belonging to the “Moscow school” of historical linguistics. We show that the general quality of the lexical data used as the basis for classification has significantly improved from earlier studies, reflecting a more careful curation process on the part of qualified historical linguists involved in the project; however, there remain serious issues when it comes to marking cognation between different characters, such as failure (in many cases) to distinguish between true cognacy and areal diffusion and the inability to take into account the influence of the so-called derivational drift (independent morphological formations from the same root in languages belonging to different branches). Considering that both the topological features of the resulting consensus tree and the established datings contradict historical evidence in several major aspects, these shortcomings may partially be responsible for the results. Our principal conclusion is that the correlation between the number of included languages and the size of the list may simply be insufficient for a guaranteed robust topology; either the list should be drastically expanded (not a realistic option for various practical reasons) or the number of compared taxa be reduced, possibly by means of using intermediate reconstructions for ancestral stages instead of multiple languages (the principle advocated by the Moscow school).

Discussion and conclusions

In the previous sections, we have to tried to identify several factors that might have been responsible for the dubious topological and chronological results of Heggarty et al. 2023 experiment, not likely to be accepted by the majority of “mainstream” Indo-European linguists. Unfortunately, it is hard to give a definite answer without extensive tests, since, in many respects, the machine-processed Bayesian analysis remains a “black box”. We did, however, conclude at least that this time around, errors in input data are not a key shortcoming of the study (as was highly likely for such previous IE classifications as published by Gray and Atkinson, 2003; Bouckaert et al. 2012), although failure to identify a certain number of non-transparent areal borrowings and/or to distinguish between innovations shared through common ancestry and those arising independently of one another across different lineages (linguistic homoplasy) may have contributed to the skewed topography.

One additional hypothesis is that the number of characters (170 Swadesh concepts) is simply too low for the given number of taxa (161 lects). From the combinatorial and statistical point of view, it is a trivial consideration that more taxa require more characters for robust classification (see Rama and Wichmann, 2018 for attempts at estimation of optimal dataset size for reliable classification of language taxa). Previous IE classifications by Gray, Atkinson et al. involved fewer taxa and more characters (see Table 1 for the comparison).

Table 1 suggests that the approach maintained and expanded upon in Heggarty et al. 2023 project can actually be a dead-end in classifying large and diversified language families. In general, the more languages are involved in the procedure, the more characters (Swadesh concepts) are required to make the classification sufficiently robust. Such a task, in turn, requires a huge number of man-hours for wordlist compilation and is inevitably accompanied by various errors, partly due to poor lexicographic sources for some languages, and partly due to the human factor. Likewise, expanding the list of concepts would lead us to less and less stable concepts with vague semantic definitions.

Instead of such an “expansionist” approach, a “reductionist” perspective, such as the one adopted by Kassian, Zhivlov et al. (2021), may be preferable, which places more emphasis on preliminary elimination of the noise factor rather than its increase by manually producing intermediate ancestral state reconstructions (produced by means of a transparent and relatively objective procedure). Unfortunately, use of linguistic reconstructions as characters for modern phylogenetic classifications still seems to be frowned upon by many, if not most, scholars involved in such research — in our opinion, an unwarranted bias that hinders progress in this area.

Overall one could say that Heggarty et al. (2023) at the same time represents an important step forward (in its clearly improved attitude to selection and curation of input data) and, unfortunately, a surprising step back in that the resulting IE tree, in many respects, is even less plausible and less likely to find acceptance in mainstream historical linguistics than the trees previously published by Gray & Atkinson (2003) and by Bouckaert et al. (2012). Consequently, the paper enhances the already serious risk of discrediting the very idea of the usefulness of formal mathematical methods for the genealogical classification of languages; it is highly likely, for instance, that a “classically trained” historical linguist, knowledgeable in both the diachronic aspects of Indo-European languages and such adjacent disciplines as general history and archaeology, but not particularly well versed in computational methods of classification, will walk away from the paper in question with the overall impression that even the best possible linguistic data may yield radically different results depending on all sorts of “tampering” with the complex parameters of the selected methods — and that the authors have intentionally chosen that particular set of parameters which better suits their already existing pre-conceptions of the history and chronology of the spread of Indo-European languages. While we are not necessarily implying that this criticism is true, it at least seems obvious that in a situation of conflict between “classic” and “computational” models of historical linguistics, assuming that the results of the latter automatically override those of the former would be a pseudo-scientific approach; instead, such conflicts should be analyzed and resolved with much more diligence and much deeper analysis than the one presented in Heggarty et al. 2023 study.

Despite all the energetic discussions of our previous attempts, it appears that the question of IE phylogeny has not yet been put to bed.

 

Selected readings

[Thanks to Ted McClure]

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-26 05:39 pm

Animal calls are not comparable to human speech

Posted by Victor Mair

But can they still tell us something useful about language?  Here are two new papers that address that question:

I.

"What the Hidden Rhythms of Orangutan Calls Can Tell Us about Language – New Research." De Gregorio, Chiara. The Conversation, May 27, 2025.

In the dense forests of Indonesia, you can hear strange and haunting sounds. At first, these calls may seem like a random collection of noises – but my rhythmic analyses reveal a different story.

Those noises are the calls of Sumatran orangutans (Pongo abelii), used to warn others about the presence of predators. Orangutans belong to our animal family – we’re both great apes. That means we share a common ancestor – a species that lived millions of years ago, from which we both evolved.

Like us, orangutans have hands that can grasp, they use tools and can learn new things. We share about 97% of our DNA with orangutans, which means many parts of our bodies and brains work in similar ways.

That’s why studying orangutans can also help us understand more about how humans evolved, especially when it comes to things like communication, intelligence and the roots of language and rhythm.

Research on orangutan communication conducted by evolutionary psychologist Adriano Lameira and colleagues in 2024 focused on a different species of orangutan, the wild Bornean orangutan (Pongo pygmaeus wurmbii). They looked at a type of vocalisation made only by males, known as the long call, and found that long calls are organised into two levels of rhythmic hierarchy.

This was a groundbreaking discovery, showing that orangutan rhythms are structured in a recursive way. Human language is deeply recursive.

Recursion is when something is built from smaller parts that follow the same pattern. For example, in language, a sentence can contain another sentence inside it. In music, a rhythm can be made of smaller rhythms nested within each other. It’s a way of organising information in layers, where the same structure repeats at different levels.

Has wonderful videos.  The orangutans sound like they're saying something.  Listen.

Discussing "Third-Order Self-Embedded Vocal Motifs in Wild Orangutans, and the Selective Evolution of Recursion." De Gregorio, Chiara et al. Annals of the New York Academy of Sciences (May 16, 2.

Abstract

Recursion, the neuro-computational operation of nesting a signal or pattern within itself, lies at the structural basis of language. Classically considered absent in the vocal repertoires of nonhuman animals, whether recursion evolved step-by-step or saltationally in humans is among the most fervent debates in cognitive science since Chomsky's seminal work on syntax in the 1950s. The recent discovery of self-embedded vocal motifs in wild (nonhuman) great apes—Bornean male orangutans’ long calls—lends initial but important support to the notion that recursion, or at least temporal recursion, is not uniquely human among hominids and that its evolution was based on shared ancestry. Building on these findings, we test four necessary predictions for a gradual evolutionary scenario in wild Sumatran female orangutans’ alarm calls, the longest known combinations of consonant-like and vowel-like calls among great apes (excepting humans). From the data, we propose third-order self-embedded isochrony: three hierarchical levels of nested isochronous combinatoric units, with each level exhibiting unique variation dynamics and information content relative to context. Our findings confirm that recursive operations underpin great ape call combinatorics, operations that likely evolved gradually in the human lineage as vocal sequences became longer and more intricate.

II.

"Animals Can't Talk like Humans Do – Here's Why the Hunt for Their Languages Has Left Us Empty-Handed." Jon-And, Anna et al. The Conversation, June 9, 2025.

Why do humans have language and other animals apparently don’t? It’s one of the most enduring questions in the study of mind and communication. Across all cultures, humans use richly expressive languages built on complex structures, which let us talk about the past, the future, imaginary worlds, moral dilemmas and mathematical truths. No other species does this.

Yet we are fascinated by the idea that animals might be more similar to us than it seems. We delight in the possibility that dolphins tell stories or that apes can ponder the future. We are social and thinking creatures, and we love to see our reflection in others. That deep desire may have influenced the study of animal cognition.

Over the past two decades, studies of thinking and language in animals, especially those highlighting similarities with human abilities, have flourished in academia and attracted extensive media coverage. A wave of recent studies reflects a growing momentum.

Two recent papers, both in top-tier journals, focus on our closest relatives: chimpanzees and bonobos. They claim these apes combine vocalisations in ways that suggest a capacity for compositionality, a key feature of human language.

In simple terms, compositionality is the capacity to combine words and phrases into complex expressions, where the overall meaning derives from the meanings of the parts and their order. It is what allows a finite set of words to generate an infinite range of meanings. The idea that great apes might do something similar has been presented as a potential breakthrough, hinting that the roots of language may lie deeper in our evolutionary past than we thought.

But there is a catch: combining elements is not enough. A fundamental aspect of compositionality in human language is that it is productive. We do not just reuse a fixed set of combinations; we generate new ones, effortlessly. A child who learns the word “wug” can instantly say “wugs” without having heard it before, applying rules to unfamiliar elements.

That flexible creativity gives language its vast expressive power. Yet while animal calls can be combined, nobody has observed animals doing this to create new meanings in an open-ended productive manner. They don’t scale into the layered meanings that human language achieves. In short: there are no wugs in the wild.

Significant progress in the conceptualization of what is humanlike about animal calls:  recursion, compositionality.

 

Selected readings

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-25 06:20 pm

Taiwanese Twosome: tea and Sino-Korean

Posted by Victor Mair

Even if you can't understand spoken Taiwanese, you can learn a lot from these two videos because of the excellent visuals, plus it is nice just to hear the clearly spoken Taigi and compare terms in Taigi with their parallels in Sino-Korean.

The first is a video from Taiwan's public TV (公視台語台) on the interesting distribution of the names of tea in the world:

The second video presents the similarities between (literary) Taiwanese and Sino-Korean pronunciations:

It packs in a lot of information about the circulation of sinographs, topolects, and texts in East Asia, together with the history of individuals who were responsible for these transformational movements, not to mention the phonology whereby to explain them.

 

Selected readings

[Thanks to Chau Wu]

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-25 11:10 am

Unit utility

Posted by Mark Liberman

Today's xkcd:

The mouseover title: "'This HAZMAT container contains radioactive material with activity of one becquerel.' 'So, like, a single banana slice?'"

explainxkcd currently fails to explain the strip's implicit reference to the entry for bogosity in the Jargon File:

1. [orig. CMU, now very common] The degree to which something is bogus. Bogosity is measured with a bogometer; in a seminar, when a speaker says something bogus, a listener might raise his hand and say “My bogometer just triggered”. More extremely, “You just pinned my bogometer” means you just said or did something so outrageously bogus that it is off the scale, pinning the bogometer needle at the highest possible reading (one might also say “You just redlined my bogometer”). The agreed-upon unit of bogosity is the microLenat.

2. The potential field generated by a bogon flux; see quantum bogodynamics. See also bogon fluxbogon filterbogus.

The Jargon File gives this explanation of "microLenat":

The unit of bogosity. Abbreviated µL or mL in ASCII. Consensus is that this is the largest unit practical for everyday use. The microLenat, originally invented by David Jefferson, was promulgated as an attack against noted computer scientist Doug Lenat by a tenured graduate student at CMU. Doug had failed the student on an important exam because the student gave only “AI is bogus” as his answer to the questions. The slur is generally considered unmerited, but it has become a running gag nevertheless. Some of Doug's friends argue that of course a microLenat is bogus, since it is only one millionth of a Lenat. Others have suggested that the unit should be redesignated after the grad student, as the microReid.

More of the (complex and contested) background can found in the 8/29/2006 LLOG post.  Wikipedia's only coverage (I think) is an entry in a List of Humorous Units of Measurement., although the dimensional analysis issues are well explained in the entry on the FFF system.

Update — My favorite unit is the scruple, which Wikipedia defines as

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-25 12:17 am

Linguistics vs. archeology and (physical) anthropology

Posted by Victor Mair

Subtitle:  "A cautionary note on the application of limited linguistics studies to whole populations"

A prefatory note on "anthropology".  In the early 90s, I was deeply involved in the first ancient DNA studies on the Tarim mummies* with Paolo Francalacci, an anthropologist at the University of Sassari. Sardinia.  Paolo was deputed to work with me by the eminent population geneticist, Luigi Luca Cavalli-Sforza of the Stanford medical school genetics department, who was unable to endure the rigors of the expedition to Eastern Central Asia. 

[*Wikipedia article now strangely distorted for political reasons.  Be skeptical of its claims, especially those based on recent DNA studies.] 

After we had collected the tissue samples in the field, Paolo took them back to Sassari to extract and analyze the attenuated DNA.  This involved amplification through PCR (polymerase chain reaction), a process that later gained great fame during the years of the coronavirus pandemic, inasmuch as it is an essential step in the detection and quantification of messenger RNA (mRNA).  Indeed, two Penn scientists, Drew Weissman and Katalin Karikó, were awarded the 2023 Nobel Prize in Physiology or Medicine for their work on mRNA technology, which was crucial in the development of COVID-19 injections.

Paolo's analysis extended over several years.  About halfway through, I flew to Sardinia and visited Paolo in his "anthropology" lab.  That was a revelation, because his whole department seemed more like it belonged to the hard sciences than to the social sciences, as I had become accustomed to for anthropology departments in the United States.  Indeed, Paolo's own specialty, evolutionary biology, was full of zoological and botanical specimens, chemical reagents and apparatus, but showed little evidence of the cultural and social investigations I was familiar with in American departments of anthropology.

I told Paolo how surprised I was by the difference between the anthropology I knew of in America and what I was seeing in Sardinia.  He smiled at me benignly and said, "We do physical anthropology," with a tone of voice and attitude that led me to believe that he considered physical anthropology to be real anthropology.

Enough by way of methodological preface.

Last week I posted "A cautionary note on the application of limited genetics studies to whole populations" (6/21/25) in which I decried overemphasis on genetics at the expense of archeology, linguistics, and many other disciplines that could be applied to the study of ancient populations.  In this post, I will come at the juxtaposition between  genetics and linguistics from the opposite angle, with history, archeology, art history, climate studies, and other relevant disciplines looking on as interested bystanders.

Once again, a claim has been made that the Xiōngnú and the Huns spoke a Paleo-Siberian Language:

Svenja Bonmann and Simon Fries, "Linguistic Evidence Suggests That Xiōng-nú and Huns Spoke the Same Paleo-Siberian Language", Transactions of the Philological Society (June 16, 2025).

Abstract

The Xiōng-nú were a tribal confederation who dominated Inner Asia from the third century BC to the second century AD. Xiōng-nú descendants later constituted the ethnic core of the European Huns. It has been argued that the Xiōng-nú spoke an Iranian, Turkic, Mongolic or Yeniseian language, but the linguistic affiliation of the Xiōng-nú and the Huns is still debated. Here, we show that linguistic evidence from four independent domains does indeed suggest that the Xiōng-nú and the Huns spoke the same Paleo-Siberian language and that this was an early form of Arin, a member of the Yeniseian language family. This identification augments and confirms genetic and archaeological studies and inspires new interdisciplinary research on Eurasian population history.

Here are the sections of the Bonmann and Fries paper:

1 Introduction
2 Earlier hypotheses on the linguistic origins of the Xiōng-nú and the Huns
3 Loanwords in Turkic and Mongolic (and how to detect them)
4 The Jié couplet and Xiōng-nú glosses
5 Hunnish anthroponymy
6 Toponymic and hydronymic evidence

In general, the appearance of the new Bonmann and Fries paper has been met with enthusiasm.  Wolfgang Behr, who posted notices about the paper on X and Bluesky, has this to say about it:

There is an exciting new paper on the language of the Xiongnu out in TPS (attached), arguing,
with fresh evidence, that it was indeed Yeniseian, as first surmised by Lájos Ligeti (1902-1987) in 1950, more specifically a variety related to the Proto-Arin branch.

In passing, it also contains good arguments against the dubious ārya-,'Aryan' *[ɢ,g]ˤraʔ > xià 夏 equation proposed by Beckwith via hypothetical,"East Scythian" (for internal etymologies of the name, cf. Behr, Asiatische Studien, LXI.3, 2008, 727–754), and plausible ideas about a Yeniseian background of the notorious Eurasian Wanderwort for 'silver' (on which cf. Anton Antonov & Guillaume Jacques, "Turkic kümüš ’silver’ and the lambdaism vs sigmatism debate", Turkic languages, 2011, 15 (2), pp. 151-170. halshs-00655014).

For, among others, the reasons alluded to above, I have reservations about the findings of this paper.  The tentativeness of the enterprise is evident in the hypothetical language in which it is couched:  "probable / probably", "seem(s)", "(un)likely" "suggest(s) / suggestive", and so forth.

I would concede that, just as Southeast and South Sinitic languages may embody substratal Austronesian and Austroasiatic elements, Paleo-Siberian / Yeniseian / Arin may constitute a substratal component of the languages of the Xiōngnú / Huns, nevertheless we should be wary of jumping to the  conclusion that Southeast and South Sinitic languages were ipso facto Austronesian and Austroasiatic and that Xiōngnú / Hunnic were Paleo-Siberian / Yeniseian / Arin languages.

When all is said and done, the base line of our researches on ancient civilizations should be their physical remains:  textiles, metals, pottery, basketry, structures, associated animals and plants, middens, pits, bones, coprolites, usw.

Specifically, with regard to the identity of the Xiōngnú / Huns, we cannot ignore the Iranian inputs in the confederation, as the late Elling Eide, who worked on this problem for decades, had assembled mountains of supporting evidence.  I believe that his records may still exist at his magnificent research library in Sarasota, Florida.

Finally, as Étienne de la Vaissière has demonstrated in his authoritative article on "Xiongnu" in Encyclopædia Iranica, the Xiōngnú were basically mounted warriors and nomads with steppe affinities to the west.  

XIONGNU (Hsiung-nu), the great nomadic empire to the north of China in the 2nd and 1st centuries BCE, which extended to Iranian-speaking Central Asia and perhaps gave rise to the Huns of the Central Asian Iranian sources.

Origins. The Xiongnu are known mainly from archaeological data and from chapter 110 of the Shiji (Historical Records) of Sima Qian, written around 100 BCE, which is devoted to them. Comparison of the textual and archaeological data makes it possible to show that the Xiongnu were part of a wider phenomenon—the appearance in the 4th century BCE of elite mounted soldiers, the Hu (Di Cosmo, 2002), on the frontiers of the Chinese states which were expanding to the north. The first mention of the Xiongnu in Chinese sources dates to 318 BCE. Archaeologically, these Hu cavalrymen seem to be the heirs of a long development (the Early Nomadic period, from the end of the 7th to the middle of the 4th century BCE), during which the passage from an agro-pastoral economy to one dominated at times exclusively by equestrian pastoralism had taken place. Among these peoples, in the 4th and 3rd centuries BCE the Xiongnu occupied the steppe region of the northern Ordos as well as the regions to the northwest of the great bend of the Yellow River. Numerous archaeological finds in Inner Mongolia and in Ningxia demonstrate the existence of a nomadic culture that was socially differentiated and very rich, in which both iron and gold were in common use and which was in constant contact, militarily as well as diplomatically and commercially, with the Chinese states (in particular Zhao to the southeast).

The Xiōngnú  were not hunter-gatherers and fishermen of the Yenisei Valley.  I am amazed and dismayed that the linguists who propose that the origins of Xiōngnú language are to be found in Ket, Yeniseian, or other Paleo-Siberian language are oblivious to this basic reality of existence and ecology.

Selected readings

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-24 12:10 am

Mi, mi, mi

Posted by Victor Mair

[first draft written June 9-10, 2025 in Bemidji, Minnesota, where the famous giant statues of Paul Bunyan and Babe the Blue Ox stand next to beautiful Lake Bemidji*]

During my peregrinations in upper midwest USA, I noticed a proliferation of place names beginning with "mi-".  Because there are 10,000 big and little glacial lakes up here, I suspected that "mi-" might be a prefix signifying "water").  I had come to Minneapolis to explore the headwaters of the Mississippi in northern Minnesota.  That alone was enough of an emphatic prompt to set me off on a linguistic "mi-" quest.

My main intention on this trip is to follow the Mississippi from Lake Itasca, whence it emerges as a small stream about ten feet wide you can walk across on a line of stones in northern Minnesota, to where it debouches into the Gulf in the south.  European-American settlers named the Mighty Mississippi after the Ojibwe word ᒥᓯ-ᓰᐱ misi-ziibi ("great river"). (sourceMisi zipi is the French rendering of the Anishinaabe (Ojibwe or Algonquin) name for the river. (source

So I had one strike against me on the first "mi".

The second "mi-" place name, Minneapolis, gave me more hope, but that Greek suffix ensured that the name as a whole was at best half Native American in origin.  As a matter of fact, though,

Nicknamed the "City of Lakes", Minneapolis is abundant in water, with thirteen lakes, wetlands, the Mississippi River, creeks, and waterfalls.

In the Dakota language, the city's name is Bde Óta Othúŋwe ('Many Lakes Town').[g] Residents had divergent ideas on names for their community. Charles Hoag proposed combining the Dakota word for 'water' (mni[h]) with the Greek word for 'city' (polis), yielding Minneapolis.

(Wikipedia)

Well, it's still pure "mi-" (actually "mni-) at the beginning.

With bated breath, I turned to Minnesota.  Bingo, a clear hit:

The word Minnesota comes from the Dakota name for the Minnesota River, which got its name from one of two words in Dakota: "mní sóta", which means "clear blue water", or "Mníssota", which means "cloudy water".  Early explorers interpreted the Dakota name for the Minnesota River in different ways, and four spellings of the state's name were considered before settling on "Minnesota" in 1849, when the Territory of Minnesota was formed. Dakota people demonstrated the name to early settlers by dropping milk into water and calling it mní sóta.

Many places in the state have similar Dakota names, such as Minnehaha Falls ("curling water" or waterfall), Minneiska ("white water"), Minneota ("much water"), Minnetonka ("big water"), Minnetrista ("crooked water"), and Minneapolis, a hybrid word combining Dakota mní ("water") and polis (Greek for "city"). The state seal features the phrase Mni Sóta Makoce ("the land where the water reflects the skies"), the Dakota name for the larger region.

(Wikipedia)

The initial "Mi-" of Missouri, the Siouan name of the longest river in America, which flows into the Mississippi from the west just above St. Louis, is completely unrelated to the Dakotan names mentioned above.  The main linguistic problems with Missouri are not with its etymology, but with how to pronounce it:

The state is named for the Missouri River, which was named after the indigenous Missouria, a Siouan-language tribe. French colonists adapted a form of the Illinois language-name for the people: Wimihsoorita. Their name means 'one who has dugout canoes'.

The name Missouri has several different pronunciations even among its present-day inhabitants, the two most common being /mɪˈzɜːri/ mih-ZUR-ee and /mɪˈzɜːrə/ mih-ZUR-ə. Further pronunciations also exist in Missouri or elsewhere in the United States, involving the realization of the medial consonant as either /z/ or /s/; the vowel in the second syllable as either /ɜːr/ or /ʊər/; and the third syllable as /i/ or /ə/. Any combination of these phonetic realizations may be observed coming from speakers of American English. In British Received Pronunciation, the preferred variant is /mɪˈzʊəri/, with /mɪˈsʊəri/ being a possible alternative.

Donald M. Lance, a professor of English at the University of Missouri, stated that no pronunciation could be declared correct, nor could any be clearly defined as native or outsider, rural or urban, southern or northern, educated or otherwise. Politicians often employ multiple pronunciations, even during a single speech, to appeal to a greater number of listeners.[11] In informal contexts respellings of the state's name, such as "Missour-ee" or "Missour-uh", are occasionally used to distinguish pronunciations phonetically.

(Wikipedia)

Water, water everywhere, and plenty of drops to drink

— finis —

———-

=====

*Lake Bemidji got its name because "Bemidji" refers to the Mississippi River, and how it flows across the lake from west to east. The word Bemidji means "Lake with crossing waters" and in its native Ojibwe it is Bemidjigamaag. (source)

It is odd that, when the Mississippi exits Lake Itasca, it flows northward about 35 miles.  I stood at the exact spot where the Mississippi enters Lake Bemidji.  You can see the river current, just a few feet wide at this point, flowing into the lake, and continues to be visible all the way until it leaves the lake and finally heads south.  This phenomenon of the river channel flow being visible in the expanse of the lake gives rise to some of the lake's names in Indian languages.  Stranger still, when winter comes and the deep freeze sets in, and ice forms over the entire lake to a thickness of 3.5-4 feet, such that you can drive vehicles over it, build ice-fishing houses on it, and so forth (this winter was particularly severe, so the ice was said to be thicker than usual), nonetheless, one can still see the river channel of water flowing out into the lake.

Afterword

After all this talk about toponymic prefixes, I am reminded of the famous case of the name of the large city (pop. 7,495,000), Wúxī 無錫/无锡 (“Wuxi City, southern Jiangsu Province”) that lies in the southern Yangtze delta and borders Lake Tai.  Superficially / ostensibly, the sinographically transcribed name means "no tin", but according to critical scholarship, both syllables are misinterpretations.  The first syllable is not a negative, but is a prefix found in other place names of the region.  As for xī 錫, it has nothing to do with tin but is likely derived from the Old Yue language or old Kra–Dai languages spoken in southern China and northern Vietnam circa 700 BC and later.

 

Selected readings

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-23 01:16 pm

"… and its launch it got."

Posted by Mark Liberman

There are several different types of "fronting" or "preposing" in English, sometimes categorized in syntactic terms (e.g. wh-movement) and sometimes in pragmatic terms (e.g. topicalization). Here's recent example of a familiar type, for which I don't know a standard name:

The stage was set for Tesla to get its launch, and its launch it got.

That example seems a bit awkward to me, but definitely still possible. Examples where the preposed item is a simpler noun phrase seem to go down a bit easier — for example, substituting "a launch" for "its launch".

The preposed item can be a a verb phrase:

He threatened to leave the meeting, and leave the meeting he did.
She said he'd be writing a letter, and writing a letter he was.

Or an adjective:

I expected them to be angry, and angry they were.

The adverbial version of so is often used in a similar way, often with the background assumed, or expressed across a conversational turn boundary:

So it seems.
So they said.
So we will.

However, scanning various grammars and articles turns up examples but no terminology. Can anyone point us to a standard term? It would be surprising if none exists.

 

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-22 01:05 pm

The importance of rhythm for memorization

Posted by Victor Mair

My wonderful 2nd grade teacher taught me how to spell Mississippi with a special sing-song rhythm, and I've never forgotten it thereafter.  Her jingle makes spelling "Mississippi" — whose shape is as contorted as its riverine course and scared me the first few times I tried to spell it myself, before she taught me the secret / knack — as easy as falling off a log.

Unfortunately, I never learned how to spell "Cincinnati" that way, so I always have to proceed carefully and cautiously when I spell the name of that awesome city in the southwest corner of my home state.

I use a similar technique for remembering my social security number, phone number, lock combinations, and so forth.  But I have not been able to apply it to recalling computer passwords, which are a terrible trial for me (ask the department staff and IT guys at Penn how awful I am with passwords and the like).  Maybe the reason rhythmic memorization don't work for passwords is that we have many of them for different purposes, plus they require weird combinations of upper and lower case letters, an arbitrary number of numbers, and a set amount of nonalphanumeric symbols.

Rhythm also plays a role in helping me to remember how many days there are in each month:

Thirty days has September — April, June, and November,

All the rest have 31,

Except February, which has 28,

Though it has 29 in a leap year.

Lots of variations in the last two lines, but February never worried me anyway, because it's a special case.  It's the number of days in the other eleven months that plagued me before I learned how to rhythmize them.

From a very young age, we use rhythmic melody to help us understand tricky parts of the alphabet — h i jk lmnop.  Some of these we make up ourselves, others we inherit from family, friends, elders, and those we trust.

And so on and so forth.

 

Selected readings

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-21 09:11 am

A cautionary note on the application of limited genetics studies to whole populations

Posted by Victor Mair

"Unraveling the origins of the sogdians: Evidence of genetic admixture between ancient central and East Asians", Jiashuo Zhang, Yongdi Wang, Naifan Zhang, Jiawei Li, Youyang Qu, Cunshi Zhu, Fan Zhang, Dawei Cai, and Chao Ning, Journal of Archaeological Science: Reports (Volume 61, February 2025, 104957)

Highlights:

  • Genome-wide data was generated for two individuals from a joint burial in the Guyuan cemetery dating to the Tang Dynasty.
  • The female individual exhibits local ancestry, while the male individual carries both local ancestry and additional genetic components.
  • The integration of genomic data with archaeological evidence suggests that the two individuals were likely husband and wife.
  • The Sogdians, who travelled to China and intermarried with local populations, played a significant role in the Silk Road trade.
 

Fair enough, but:

Abstract

The Silk Road, an ancient trade route connecting China with the West, facilitated the exchanges of goods, ideas, and cultural practices among diverse civilizations. The Sogdians were prominent merchants along the Silk Road, renowned for their roles as traders, artisans, and entertainers. They migrated to China, forming enduring communities that produced multiple generations of descendants. Despite their historical importance, primary written records detailing the origins of the Sogdians and their interactions with local populations are limited. In this study, we generated genome-wide data for two ancient individuals from a joint burial (M1401) in the Guyuan cemetery dating to the Tang Dynasty (618–907 CE). To our knowledge, this represents the first ancient genomic data obtained from the Sogdian population. Our results reveal that the female individual exhibits local ancestry, while the male carries both local ancestry and additional genetic components linked to the Bactria-Margiana Archaeological Complex (BMAC) in Central Asia. This was introduced into the local gene pool approximately 18 generations ago. Combining historical, archaeological, and genetic analyses, we conclude that the two individuals were likely husband and wife. Our findings suggest that Sogdians, who initially traveled to China for trade, settled, intermarried with local populations, and played a significant role as intermediaries in Silk Road commerce. This study highlights the importance of Sogdiana at the end of the first millennium BCE in fostering connections between the Hellenistic world and the Qin/Han dynasties, emphasizing early Sogdian identity traits that preceded their later prominence as key merchants of the Silk Road.

Again, the bulk of these observations are sound and safe, but the last sentence is garbled and overreaching, hence admonition is advised.

The Introduction of the paper consists of three paragraphs giving basic information about the history of the Silk Road, who the Sogdians were, and how the Sogdians settled in China.  The main sections of the paper are:

Archaeological context of Guyuan Tang dynasty tomb (M1401)
Ancient genome data overview and ancient DNA authentication
Discussion and conclusion

As one would expect from a paper on ancient DNA, the overwhelming emphasis is on the description of the human remains in  Tomb M1401, together with the extraction and analysis of their DNA.  These findings have caused quite a sensation among scholars and laypersons from various fields.  However, if one does a google search on — guyuan tomb M1401 — (no quotation marks or dashes) one will get a very different picture of the male occupant of the tomb from that offered in the paper under discussion here, namely the European aspects of his physical remains.  In contrast, the current study emphasizes his local affinities.

Strictly speaking, this study applies only to the two individuals whose ancient DNA remains were the subject of the analysis.  Similar interpretations have been applied to ancient DNA studies of specimens from the Tarim Basin, Mongolia, and elsewhere in Central and Inner Asia.  From these limited data, large claims about entire populations are made, giving precedence and weight to genetics, highlighting the "local admixture" of available specimens.

I believe that the balance has swung too far in favor of genetic material, which, after all, require extensive chemical, mathematical, and statistical manipulation to make sense of.  In my estimation, we should pay more attention to the larger panorama provided by history, archeology, language, and art history, e.g., "Sogdians on the Silk Road" (5/22/25).  Indeed,we need to take a very close look at the Guyuan Sarcophagus itself, including the massive volumes of Rosalind E. Bradford (2009), whose research has uncovered motifs from across Asia and even North Africa, while not overlooking the Chinese facets of this extraordinary coffin.

 

Selected readings

[h.t. Hiroshi Kimamoto}

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-20 04:27 pm

"AI" == "vehicle"?

Posted by Mark Liberman

Back in March, the AAAI ("Association for the Advancement of Artificial Intelligence") published an "AAAI Presidential Panel Report on the Future of AI Research":

The AAAI 2025 presidential panel on the future of AI research aims to help all AI stakeholders navigate the recent significant transformations in AI capabilities, as well as AI research methodologies, environments, and communities. It includes 17 chapters, each covering one topic related to AI research, and sketching its history, current trends and open challenges. The study has been conducted by 25 AI researchers and supported by 15 additional contributors and 475 respondents to a community survey.

You can read the whole thing here — and you should, if you're interested in the topic.

The chapter on "AI Perception vs. Reality", written by Rodney Brooks, asks "How should we challenge exaggerated claims about AI’s capabilities and set realistic expectations?" It sets the stage with an especially relevant lexicographical point:

One of the problems is that AI is actually a wide-reaching term that can be used in many different ways. But now in common parlance it is used as if it refers to a single thing. In their 2024 book [5] Narayanan and Kapoor likened it to the language of transport having only one noun, ‘vehicle’, say, to refer to bicycles, skate boards, nuclear submarines, rockets, automobiles, 18 wheeled trucks, container ships, etc. It is impossible to say almost anything about ‘vehicles’ and their capabilities in those circumstances, as anything one says will be true for only a small fraction of all ‘vehicles’. This lack of distinction compounds the problem of hype, as particular statements get overgeneralized.

(The cited book is AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference.)

I'm used to making this point by noting that "AI" now just means something like "complicated computer program", but the vehicle analogy is better and clearer.

The Brooks chapter starts with this three-point summary:

  • Over the last 70 years, against a background of constant delivery of new and
    important technologies, many AI innovations have generated excessive hype.
  • Like other technologies these hype trends have followed the general Gartner
    Hype Cycle characterization.
  • The current Generative AI Hype Cycle is the first introduction to AI for
    perhaps the majority of people in the world and they do not have the tools to
    gauge the validity of many claims.

Here's a picture of the "Gartner Hype Cycle", from the Wikipedia article:

A more elaborately annotated graph is here.

Wikipedia explains that "The hype cycle framework was introduced in 1995 by Gartner analyst Jackie Fenn to provide a graphical and conceptual presentation of the maturity of emerging technologies through five phases."

Jackie Fenn doesn't have a Wikipedia page — a gap someone should fix! — but her LinkedIn page provides relevant details.

 

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-20 11:43 am

Incredulous, incredible, whatever. . .

Posted by Mark Liberman

I thought this use of incredulous in a recent Forbes article was a malapropism for incredible:

If you thought that my May 23 report, confirming the leak of login data totaling an astonishing 184 million compromised credentials, was frightening, I hope you are sitting down now. Researchers have just confirmed what is also certainly the largest data breach ever, with an almost incredulous 16 billion login credentials, including passwords, exposed. As part of an ongoing investigation that started at the beginning of the year, the researchers have postulated that the massive password leak is the work of multiple infostealers. [emphasis added]

And maybe it was.

But the OED glosses this usage as obsolete a1616-1750, tracing it back to Shakespeare:

Still, quick searches for "incredulous number", "incredulous amount", "incredulous price", etc., show that the usage is Out There today.

Wiktionary  agrees with the OED, glossing this sense as "Difficult to believe; incredible", and flagging it as "largely obsolete, now only nonstandard".

Merriam-Webster also gives this meaning as sense 3, and offers this Usage Guide:

Can incredulous mean 'incredible'?:

Sense 3 was revived in the 20th century after a couple of centuries of disuse. Although it is a sense with good literary precedent—among others Shakespeare used it—it is widely regarded as an error resulting from confusion with incredible, and its occurrence in published writing is rare.

…with a longer discussion here.

And Merriam-Webster's Concise Dictionary of English Usage also goes into more detail:

Language Log ([syndicated profile] languagelog_feed) wrote2025-06-20 03:34 am

Bopomofo Cafe

Posted by Victor Mair

Chris Button saw this bubble tea place at 3:45 PM today in Hollywood:

From the cafe's website:

BOPOMOFO CAFE draws its name from the phonetic Traditional Chinese Alphabets. ㄅ, ㄆ, ㄇ, and ㄈ [bo, po, mo, and fo] are the “ABCs” of the Mandarin Chinese alphabet symbolizing nostalgia and strength as the building blocks of Mandarin language mastery. Co-founders Eric and Philip, both "American Born Chinese" (ABC), chose the name to reflect their heritage and shared pride in their culture.

Chris ended up going inside. The branding on the cups is clever. They've made the shapes of the b, p, m, f look like the shapes of the zhuyin.

Selected readings

[Thanks to Ben Zimmer]