Today’s AI feeds on the underpaid linguistic labor. By this I mean the work of crowdsourced workers: they provide the tech companies with vast, clean datasets — and, at the same time, hardly ever get paid even a minimal wage. They spend their days labeling descriptions of violence and sexual abuse — so that ChatGPT becomes less toxic. Bored and traumatized, they sell their linguistic capacity: monotonous tasks alienate their language, and turn it into a stream of data. Dehumanized, they labor so that AI can appear more human.
Pavel Polshchikov, Pathological vectors №1
There's no overstating that crowdsourcing exploits human cognitive resources in a ruthless manner. As Phil Jones has argued in his recent monograph, crowdsourcing is the last option for the most vulnerable workers. In the second half of the 20th century, the growing service sector served as such a refuge; today, however, more and more jobs are getting automated, leaving the least privileged with no choice but to resort to Amazon Mechanic Turk. And platforms will ensure they pay them as little as possible, given the abundance of workers and the lack of labor law protections for crowdsourced labor.
Linguistic capacity is one of a few traits that, arguably, differentiates us from other species. Animals do communicate, but they are unable to produce the utterances with complex syntax. Human natural language, by contrast, is marked by recursive structures, which allow us to express an infinite number of thoughts. Language faculty is, thus, creative by definition. Crowdsourced linguistic labor, conversely, is extremely monotonous: it reduces the language to the bureaucratic procedures of data labeling. In other words, it turns workers into targets.
Or, to paraphrase it in more general terms: cognitive capitalism reduces human intelligence to a stimulus-response model. The notion of intelligence subsumes complex choreographies of both thought and body. We are not ghosts in a machine, as Descartes claimed: quite the contrary, our rhythms of thinking hinge upon our bodily rhythms. Poetry, for instance, is not just words on paper: it is as much a score of breath, inhales and exhales. And yet the crowdsource jobs treat the workers in much the same way as a behavioral scientist would treat a specimen.
The epistemology behind the crowdsourced linguistic labor did not emerge out of nowhere. It has its roots in the 20th century theories of language. 20th century linguistics, just like humanities in general, was influenced by cybernetics: a cybernetic rendition of earlier structuralist theories, one could say. However, as Bernard Geoghegan has shown in his recent book, “cybernetic” does not equal “neutral”. For instance, in Margaret Mead’s anthropological investigations, information theory is inseparable from colonial ethnography.
Imperial science’s raison d’être is to control the population of colonies. To control the colonized, one has to understand their culture and language. But, crucially, colonial science is also based on imperial control, and takes it as its premise. Colonial power transforms a living social organism into an ethnographic museum with living figures. A person is reduced to a specimen, or an informant. And the ethnographer, as well as a linguist, does not have to pay them for the knowledge they provide. They are the living sources of data.
The unpaid colonial labor of language informants is the first predecessor of today’s linguistic data labeling. The second one comes from the mid-20th-century theoretical linguistics, inspired by Chomsky. The empirical basis of formal linguistics is the grammaticality judgements made by native speakers. They are faced with a query consisting of isolated sentences (or stimuli), which they have to rate as acceptable or unacceptable. This is a simple experimental setting, which reduces the language to the ability of discriminating between “right” and “wrong”. This bureaucratic aspect mirrors the monotonous tasks faced by modern linguistic consultants and crowdsourced workers. Those who even once fulfilled the role of a linguistic consultant know how increasingly boring and monotonous the linguistic queries are.
Thus, just as the labor of linguistic consultants in colonial settings exploited human cognitive resources, today’s crowdsourced linguistic labor does the same. Similarly, crowdsourced jobs, much like the work of linguistic consultants for today’s cybernetics-inspired linguistics, embody a bureaucratic attitude toward language, further alienating it from the body.