UX, Data & Research
How can we know if what we know is true?
Access the full thesis document.
My MA thesis in Digital Humanities explored the question of scientific truth by examining reproducibility challenges through the lens of Open Science. I investigated the presence of open data and open code in scholarly research published in the field of the Digital Humanities, adding the body of evidence that shows gaps between Open Science advocacy and actual practice. Despite growing awareness of the importance of transparency, many studies remain inaccessible or lack the necessary documentation for replication, raising concerns about the reliability of published findings when we confront this with the reality of what has now been called the Reproducibility Crisis in science (demonstrated by the two Reproducibility Projects conducted by the Open Science Foundation).
Besides quantifying openness indicators, I also conducted a replication project for Peter Meinderstma’s research on changes in Billboard Top 100 songs’ lyrics, with the goal of understanding the challanges involved in conducting this type of project - especially any challanges of particular concern to researchers in the field of the Digital Humanities.
The research highlights key barriers to openness, including unclear guidelines, lack of incentives, and concerns over data privacy. By quantifying trends in research transparency, my work provides empirical insights into the state of reproducibility in the Digital Humanities and the systemic changes needed to foster a more open scientific ecosystem. This study underscores the crucial role of Open Science in ensuring that knowledge is not only created by research but also verifiable—helping us move closer to answering a fundamental epistemological question: how can we know if what we know is true?
See below the thesis abstract as well as a few key figures.
Abstract Valid scientific claims are made based on replicable observations and, in this regard, the replication of published research is an important form of scientific validation. Although the broader scientific community is aware of this and replication has been deemed a cornerstone of the scientific method by the philosophy of science, little incentive exists to promote and facilitate the practice of replicating scientific studies as well as share replication results. This led to what has been called a “Reproducibility Crisis” in science and efforts are now underway to understand and remedy this crisis.
The goal of this thesis is to bring current research being conducted in metaresearch - the study of scientific research itself - to the field of the Digital Humanities to understand if it is also affected by issues observed in other fields, and explore topics and issues related to replication in the context of DH.
Methods:
Two studies were conducted to achieve this goal: a survey of papers published in DH and literary criticism journals in 2021 and the replication of a published DH research project that analyzed changes in the lexicon and sentiment of popular US songs.
Results:
The results of the survey indicate that DH exhibits similar transparency indicators observed in other disciplines: half of the 110 papers that relied on empirical data were available as open access; roughly a third shared the code used for data analysis and roughly two-thirds shared the data used in the study
Conclusion:
Better transparency indicators and journal policies that encourage authors to adhere to a culture of replication would facilitate replication efforts such as the one conducted for this thesis, by reducing time and energy needed to recreate data and code used to validate and extend results.
(left) Transparency indicators of the 526 papers evaluated in the study, and (right) transparency indicators of the 110 papers that relied on quantitative data broken down by the journals they were published in:


Original results (top) and replication results (bottom) of five measurements originally published by Peter Meinderstma in his research on changes in lyrical diversity in Billboard hit songs:




