Research

MTurk Research by UK’s Gatton College Faculty Making Waves

photo of data farm

LEXINGTON, Ky. (Oct. 15, 2018) University of Kentucky Gatton College of Business and Economics Assistant Professor of Accountancy Sean A. Dennis and doctoral student Chris Pearson, along with Brian M. Goodson of Clemson University, have released cutting-edge research that investigates the recent data contamination issue on Amazon’s Mechanical Turk platform. Commonly called "MTurk," the platform allows individuals to become an on-demand employee and be paid a nominal fee for completing simple tasks, such as academic or marketing surveys.

The online "MTurk panic," as it has been widely referred to, began when University of Minnesota psychology graduate student Max Hui Bai observed a "quality drop" on the platform. He posted in a Facebook group for psychology researchers, which prompted an investigation and speculation that bots were the culprit. Coincidentally, Dennis and Pearson had been looking into the issue weeks before Bai posted on the social networking site.

According to Dennis, he and his colleagues stumbled upon the problem accidentally. “We've been running experiments on Turk for several years. But, with our most recent data collection, we noticed more noise in our data than we've ever seen before,” he explained.

For example, after running an experiment, one thing the team noticed in their data set right away was that, “A lot of responses to the open response questions we'd asked were just bad. Some were one-word answers, some were incoherent English, some of them were pretty clearly copied and pasted from other parts of the internet.”

To identify the issue, the team used reverse geocoding. Reverse geocoding is the process of reverse coding a location from longitude and latitude coordinates back to a readable address or place. They analyzed large sets of unique IP addresses along with duplicate GPS coordinates, and found that each of these IP addresses traced back to a "server farm," or data center, and virtual private server (VPS) provider.

Dennis explained that when Pearson ran the IP addresses through various tracker websites, they were showing up as having strange internet service providers and strange organizations associated with them. “That was odd because normally the ISP comes through as something that we'd recognize like Verizon or AT&T or something you'd see on an internet bill. And we were seeing things like 'Joe's Data Center' and 'ColoCrossing.' We didn't know exactly what that meant, but when we looked into it, we figured out that they were actually server farms,” he said.

The group then found that these server farm responses were significantly lower in quality than other responses. Removing these "contaminated" responses has enabled several researchers to draw more accurate conclusions from otherwise unusable data sets.

Dennis and Pearson's paper, "MTurk Workers' Use of Low-Cost 'Virtual Private Servers' to Circumvent Screening Methods: A Research Note," was released on the Social Science Research Network (SSRN) on Aug. 17, 2018. The paper analyzes survey meta-data, examines MTurk participants’ open responses, and develops techniques to determine the reliability.

According to the paper's abstract, they found "alarming proportions of participants who circumvented several accepted sample screening methods. These ‘imposters’ provided responses through VPS that concealed their physical locations, thereby rendering conventional location screening methods useless."

Since its release on SSRN, the paper has received over 1,000 downloads and 5,000 abstract views and it is being cited and linked on message boards across the internet as an effective fix for the issue.

This cutting-edge research is linking UK and the Gatton College with the important issue of automated programs mimicking human behavior. This topic has garnered significant attention among the academic community and the press.

“We really meant for this to be part of a conversation about ongoing efforts that will be necessary to preserve the integrity of the Turk platform because it will inevitably be abused,” Dennis said. “There is no silver bullet solution that we're going to be able to identify today that will work in perpetuity. We view this as part of an ongoing conversation between the academic community and developers who actually have the tools to fix this.”