Stop Using MTurk

15 September 2020

Why now? My thought process was simple: My research output thus far has been based on underpaid labour. Do I want that to continue?

For me, the answer is no, hence why I’m going to stop using MTurk. Perhaps there are alternatives that are fair to workers, that lead to good wages and job security. Linguistic data is important, and more so when it comes from a diverse population. But I don’t think paying miniscule amounts of money to obtain it is something to be proud of. We can do better.


I’ve been thinking more about this topic after a research seminar discussion – would ensuring that workers get paid atleast $10 per hour fix the problem? It certainly ensures that they get paid a decent wage by you. A few requesters making the decision to pay more(and advertising their decision) isn’t going to make the median wage go up, nor does it do much to fix the numerous other issues with how the platform treats workers. It does help in a small way, and its the least evil thing you can do on the platform.

Update 2

Fair Work is a cool resource from Stanford’s HCI group that runs a server-side script to ensure your crowd workers are paid minimum wage of $15. I found it while reading this excellent opinion piece by Glenn Davis & Klint Kanopka at The Stanford Daily. I agree with their conclusion: its upon Amazon to change the incentive structure – requesters paying workers more and being transparent helps, but from the workers perspective, it will always be a race to the bottom.

Ongoing Genocide

15 August 2020

Howard Zinn in A People’s History of the United States (emphasis mine):

One can lie outright about the past. Or one can omit facts which might lead to unacceptable conclusions. Morison does neither. He refuses to lie about Columbus. He does not omit the story of mass murder; indeed he describes it with the harshest word one can use: genocide.

This passage, and especially the last line, made an unusual impression on me when I first read it, and still does. Perhaps it’s the fact that one word needs to do so much work. How can one word possibly convey the magnitude of death, destruction, and suffering? On the other hand, genocide, along with words like racist, do seem to work. Those in power (and the privileged) seem more upset over being labelled racist, or being accused of genocide, than the crimes themselves.

I was reminded of the passage when reading a report describing the forced labour of Uyghurs in China, which used the term — cultural genocide (or ethnocide) to describe the Chinese government’s practises. Words matter, and the crimes against Kashmiris and Dalits need to be called for what they are, regularly and loudly.

Can the decades of military occupation of Kashmir, and the torture and killing of Kashmiris be captured by these words? Or the centuries of discrimination and persecution directed at the Dalit community? Calling these crimes for what they truly are — ongoing genocide, may not be adequate to describe the extent of the horrors that the people have experienced, nor the generational trauma, but it is a good start.

Nuance Fucks

3 August 2020

I recently re-read Kieran Healy’s 2015 paper deriding the abundance of ‘nuance’ in U.S Sociological research. I love this paper because it forced me to think about how I may fall into some of the ‘nuance traps’ that he identifies, especially the nuance of the fine-grain:

First is the ever more detailed, merely empirical description of the world. This is the nuance of the fine-grain.

Is there a similar phenomenon of nuance rising in Computational Linguistics? Anecdotally, I do see the words ‘nuance’ and ‘fine-grain’ more often in new papers and talks. There is a way to verify this — the ACL Anthology provides a bibliography of articles with their titles and abstracts dating back to 1960s. Searching the bibliography using some simple regular expressions, we see that the proportion of articles that mention the words ‘nuance’ or ‘fine-grain’ in their titles or abstracts have indeed risen in the last 10 years (there were few mentions before 2000 so I’ve excluded them).

However, the bib file has some issues – older articles are generally missing their abstracts, so we only have their title to search from. Top people are trying to fix the missing abstract field issue, but until then, we can recalculate the proportion of ‘nuanced’ articles by only considering entries that have an abstract field in the bibliography. We observe the same trend:

I don’t think that all of these papers fall into the nuance traps that Healy describes. Nor do I think that this trend of exploring the subtleties and richness of language is bad for computational models and theories of language – my first paper is about exploring nuance within the theory of generics! However, I do fear that I regularly fall into the trap of valuing some research more because it is more nuanced.