Novel Methodological Approach Combines Data Donation and Surveys 

Nordicom has recently published, in collaboration with DATALAB – Center for Digital Social Research, at Aarhus University, an in-depth reporting of a novel methodological approach combining data donation and surveys. The publication demonstrates how data donation can be scaled to national population level to understand trends in digital media usage and potential challenges for digital democracies. Kristin Clay, manuscript editor at Nordicom, spoke with co-author Anja Bechmann about the study’s main findings and their broader relevance.

Kristin Clay (KC): We were pleased at Nordicom that we were able to collaborate and get this valuable work out to the community. How did you get started with this new methodological design? What was your motivation – were there certain issues with other methodologies that you sought to solve?

Anja Bechmann (AB): In 2014, I collected Facebook data from a similar national sample using the application programming interface (API) available at the time for engagement and content exposure data (see, e.g., an article on posting behaviour in Nordicom Reviewone on exposure and filter bubbles in Digital Journalism, and one on cross-country private, closed, and open group interaction patterns in Social Media + Society).

However, this access to exposure and private data was terminated in 2015, which has made it a lot more challenging for researchers to study behaviour on social media. Another aspect was changes in legislation, as in 2018, when new possibilities for data access presented themselves, as the General Data Protection Regulation (GDPR) made it mandatory for Big Tech platforms to provide data portability log files to users on request. Since then, I have wanted to scale this kind of data donation to national population level to lean on experiences from the Facebook study, but at the same time utilise available resources to study other types of data. For a better understanding of exposure patterns, a focus on the large-reaching video-based social platform YouTube was the next logical step. Especially the combination of trace data with more extensive survey data lifts our understanding of the context of the user and its impact on the documented user behaviour visible in the donated data.

(..) this access to exposure and private data was terminated in 2015, which has made it a lot more challenging for researchers to study behaviour on social media. Another aspect was changes in legislation, as in 2018, when new possibilities for data access presented themselves, as the General Data Protection Regulation (GDPR) made it mandatory for Big Tech platforms to provide data portability log files to users on request.

I am very grateful that Independent Research Fund Denmark could also see the potential and make this happen so that we can contribute to further developing this method in the field of digital media and communication research and beyond. 

KC: What makes this methodological approach unique?

AB: Data donation is unique in the sense that researchers do not depend on Big Tech platforms for access: With legal and ethical clearance, data donation creates access to studying user behaviour with direct consent from the participating user. This provides fertile ground for the freedom of science, as the research does not have to be provided within the limits of proprietary data access provided in APIs and datasets from Big Tech platforms. Additionally, restrictions on the scope of research (systemic risks) provided in the Digital Service Act (DSA) Article 40 access requests are not applicable for data donations building on the data portability rule of GDPR. However, researchers are of course dependent on Big Tech and politicians to keep the current interpretations of the GDPR in place that allow for complete logfiles of user data to be delivered – one of the limitations our publication also discusses in more depth. 

Data donation is unique in the sense that researchers do not depend on Big Tech platforms for access: With legal and ethical clearance, data donation creates access to studying user behaviour with direct consent from the participating user.

Another unique characteristic is our implementation of the method with a donation-first setup and a very strict data validation process, which at the same time provided the research team with a high number of donations and quality data that we are excited about investigating further and publishing with. Finally, we have collaborated with a survey institute and used quotas for gender, age, and level of education to approximate a sample that is representative of the adult population of Danish YouTube users. This means that the study can render more generalisable insights.

KC: Did you encounter any surprises along the way?

AB: From my experience with my earlier Facebook study in 2014, I expected an even larger drop-off due to recent focus on GDPR, profiling, and privacy protection.  Our study showed a large drop-off rate, but I was actually positively surprised by the willingness of Danes to donate data and the fairly short fieldwork time and time for completion. We expected the older age group to be the most difficult to recruit due to potential lower technical skills, which also held true, but it was surprising that education level did not seem to play a role in the technically demanding phase of the data donation. 

With regards to the donated data, it surprised us negatively that there was a systematic cap of 200 comments per participant logfile and that the watch logfiles did not contain a “stop watching” time stamp but only a time code for when people start viewing a video. We can make calculated estimates, but it would have been more robust and accurate if the files had contained this information as well. These limitations of the provided information set unexpected boundaries for what we can investigate. Some of the metadata (e.g., whether it is a Shorts video) we also additionally needed to request through DSA, so additional time and resources are necessary to supplement the study. 

KC: One of the main conclusions is that data donation can be scaled to national population level. What are the implications of this for future research?  What are the potential challenges?

AB: The positive implications of our ability to scale data donations to population level are that we can better understand which groups of people (e.g., in relation to media diets, psychology, socioeconomics, political dimensions, demographics) are especially vulnerable to challenges with Big Tech platforms when utilising the method in combination with larger surveys. We can also, outside our own networks of similar people, have a better understanding of the dynamics and mechanisms of Big Tech platforms as infrastructure in themselves and as infrastructure for associated actors, and how they utilise dynamics to gain attention and try to persuade people. 

Despite utilising the study according to an approximated population sample on gender, age, and education, the biggest challenge lies in the significant drop-off, which can never make it representative because people that choose to be part of the study may significantly deviate from people who choose not to be part of such a study. Albeit cross-references can be made to survey responses, and the data might be the best data on platform user behaviour at the moment, studying trace data will, like any other method, always provide only a glimpse into actual usage, and any conclusions coming out of such a study need to take this into consideration. 

KC: How would you like to see this approach utilised in the future? What potential value does this have for understanding the democratic implications of digital and social media use, data infrastructures, and platform influence?

AB: As tech platforms deliver the same data to all researchers in all countries across the EU, I find it an extremely useful data source in combination with surveys to do cross-country analysis in future studies that could shed light on differences in the use of Big Tech platforms in different nationalities, but also similarities in challenges to democracies. In the publication, we discuss the high trust level of the Nordics as a potential reason why we succeeded in collecting a high number of donations. Future studies with a similar method in other countries would also shed light on national differences in willingness to donate.  

Another trajectory for future studies that colleagues in Germany are already experimenting with is cross-platform donations where users donate their logfiles from not only one platform but from several. By doing so, they provide a more holistic user-centric picture on platform usage, but at the same time, challenges arise because GDPR does not specify in detail which format the logfiles should be delivered in. So, there is analytical development to be done here as well, and each platform’s utilisation of the data portability rule of GDPR contains data limitations. 

Future studies with a similar method in other countries would also shed light on national differences in willingness to donate.

Lastly, it would be interesting for future studies to follow how platforms (and politicians) interpret a user’s interaction with AI chatbots in terms of data portability. If such files are included, it would allow researchers to provide extensive population-level studies of biases, learning loops, persuasion patterns, and other potentials and challenges to democracies.  

Read the publication Open Access here: https://www.nordicom.gu.se/sv/publikationer/data-donation-method-investigating-trends-and-challenges-digital-media-landscapes