Qualitative Methods & Big Data
SayWhat recently visited with Justin McCrary, Director of the UC Berkeley Social Sciences Data Lab, aka the D-Lab, to learn more about what they are doing. One of their goals is to build networks through which Berkeley researchers can connect with users of social science data in the off-campus world. The D-Lab is a sister organization to the Social Science Matrix, a flagship center for cross-disciplinary research at UC Berkeley. We jumped at the opportunity to visit the D-Lab. Market researchers have few chances to talk shop with academic researchers.
The kinds of questions social science scholars are focused on are much different from the concerns clients bring to us. For example, one Berkeley PhD graduate student is currently working on answering the question: “What is the meaning of slavery within different religious denominations based on 100 years of archival data from church and other publications?” Associate Professor Nikki Jones is conducting a systematic analysis of video records that document routine encounters between police and civilians, including young Black men’s frequent encounters with the police. If she can identify the key variables associated with encounters that go wrong, solutions can be developed. The methodology involved in the slavery study involves textual content analysis while the police encounter study involves coding video data. In our work with companies both text and video data are also highly relevant. Obviously, the stakes are very different for commercial research. But we are dedicated to finding answers for our clients and we recognize the value in learning about state of the art methods currently in use among academic social scientists.
I asked about the viability of using social media data as a sample pool because companies are increasingly turning to social platforms to understand their customers. But harvesting data from social media presents several methodological obstacles.
One is irony. None of the algorithms currently used to ascertain sentiment are successful at detecting sarcasm. “Oh yeah, I’m SO going to rush right out and try [brand x or product y]” would be coded as positive sentiment and possibly even counted as intent to purchase, when the speaker could very well be expressing sarcasm—the exact opposite.
Another obstacle is bloggers who are paid to endorse certain brands and products and bots that churn out automated social media posts which create “noise” which in turn distorts the data. The algorithms that allow us to scale the tasks associated with content analysis of large amounts of textual social media data are imperfect, but they are still valuable to the extent that they can mimic human judgment.
The conversation turned to the increasing amount of visual data piling up on social media and what, if any, tools are being developed to analyze those data to scale. Justin indicated there are few people currently addressing this problem on campus, most of whom are not in social sciences. “The people who are actually aware of the ways to scale aren’t sure of what they would do with all the video if they had it….So that’s why I say this is part of the data future. It’s something that people have pretty much nailed down with respect to text. So [now you] say, I’ve got all this video, what are people doing [to analyze it]? That’s topic modelling in video. [My] guess is that’s the nut that’s being cracked in the coming years.”
I asked Justin if there are video versions of the algorithms that can filter text-based data and he said: “One of the harder things is bringing together the computer scientists [who are thinking on an abstract level] with the social scientists or the marketing folks who are the ones with the questions and the applications. They’re building their Rube Goldberg machine or whatever but they don’t really know who’s gonna use it. That’s part of why you have these interdisciplinary organizations [like D-Lab and Matrix] trying to facilitate that [interaction]. Everybody can tell there’s a lot of interesting stuff [such as video data] that’s being left on the table because we haven’t quite gotten the marriage between the tools and the questions. And it’s that marriage that we’re really trying to [facilitate].”
For more information on the evolution of qualitative methods in Big Data, please contact us. We’d love to hear from you.