|Applications of Machine Learning
- Generating a Pairwise Dataset for Click-through Rate Prediction of News Articles Considering Positions and Contents - Shotaro Ishihara (Nikkei, Inc.) and Yasufumi Nakama [Paper]
In online news websites, the headlines and thumbnail images of articles are displayed in a list, and they are important navigation links to individual article pages. If we can predict the click-through rate (CTR) of readers to the article pages, we can assist the editors in creating article headlines and setting thumbnail images. However, the CTR that can be observed in the access log is heavily affected by the display position, and it is difficult to predict the CTR by machine learning using data on single articles. This paper proposes a method to construct a pairwise dataset based on the information such as similarity of the display positions and contents, and create a model to predict the CTR in the framework of pairwise learning-to-rank. In the experiment, we verified the usefulness of the proposed method by using the actual access log data and discuss the potential of the practical use of the CTR prediction as editing support.
- Detecting Stance of Tweets Toward Truthfulness of Factual Claims - Zhengyuan Zhu, Zeyu Zhang, Foram Patel and Chengkai Li (University of Texas at Arlington) [Paper]
Journalists aim to understand misinformation on social media, especially in discerning the public’s opinions toward the veracity of misinformation. For that, an algorithmic tool for truthfulness stance detection can be particularly useful. This paper introduces a deep learning model we developed for detecting the stance of tweets toward the truthfulness of factual claims. The models were constructed using a dataset curated and annotated in-house. While both the models and datasets warrant further development and refinement, preliminary experiments demonstrated promising results. The model is available through both an Application Programming Interface (API) and a demonstration website.
- What’s the Fairest of Them All? Aesthetic Assessment of Visuals - Marc Willhaus, Daniel Vera Nieto, Clara Fernandez and Severin Klingler (Media Technology Center ETH Zurich) [Paper]
Attractive images and videos are the visual backbone of journalism and social media. From trailers to teaser images to image galleries, appealing visuals have only grown in importance over the past years. Especially online, eye-catching visual content can significantly impact user engagement. However, selecting the best shots from a long video or selecting the perfect image from a vast image collection is a challenging and time-consuming task. This paper presents a system to automatically assess image and video content from the perspective of aesthetics. While this is a highly subjective task, we find that it is possible to combine expert knowledge with data-driven information to perform such an assessment. In order to do so, we identify relevant aesthetic features together with experts from the media industry and implement machine learning algorithms to infer them from the visual content. We combine the features under a single aesthetics retrieval system that allows users to sort uploaded visuals according to an aesthetic score and interact with additional photographic, cinematic, and person-specific features. The system is built into a containerized application to guarantee reproducibility. A demo video of our tool is available.
|Online Communities and Local News
- Comparing open-ended community dialogue with local news - Hope Schroeder, Doug Beeferman and Deb Roy (MIT Center for Constructive Communication) [Paper]
In the lead-up to Boston’s local elections in November 2021, the Real Talk For Change project hosted small group conversations in which local residents shared their experiences of living in Boston in response to the prompt “What is your question about the future of Boston and your place in it? What experience led you to that question?”. Over 370 people from 21 of the 23 neighborhoods of Boston participated, often sharing deeply personal stories. The conversations were recorded, analyzed, and used as a basis for sharing themes and stories with the Boston community via a public portal (https://portal.realtalkforchange.org/), and as an input into public dialogues with the mayoral candidates. In this study, we apply topic modeling to a subset of the RTFC corpus of transcribed conversations to surface the inferred agenda emerging from conversations, here expressed as a distribution over topics. We compare this to the distribution of topics covered in a time-matched sample of news stories published in "The Boston Globe." We apply a semi-supervised keyword extraction method to enable quantitative analysis across the conversation and news corpora. Significant differences in the topic distributions of the two corpora reflect a mismatch between how much attention the city's largest news source gives to historically underheard residents and their expressed needs and concerns. The methodology points towards a systematic way for local news organizations to consider community experiences as an input for which topics they cover and how to cover them.
- Local, Social, and Online: Comparing the Perceptions and Impact of Local Online Groups and Local Media Pages on Facebook - Marianne Aubin Le Quéré (Cornell Tech), Mor Naaman (Cornell Tech) and Jenna Fields (Cornell University) [Paper]
With the steady closure of local newspapers, many communities have been left without reliable news and information. Technology platforms are attempting to fill the void by providing community forums or neighborhood apps where users read and share local information. Today, Facebook groups (which include buy-and-sell, local interest, or community discussion groups) are one popular form of digital local information sharing. This study investigates how local online groups are perceived compared with more traditional local news outlets, and compares the pro-community benefits provided by each. Based on prior theoretical contributions, we developed a framework for measuring the benefits of local information presence on individual-level pro-community attitudes. In our experiment (N=170), we asked frequent Facebook users living in four U.S. cities (Boston, Columbus, Nashville, Seattle) to start following local news pages or local online groups on Facebook, and compare their perceptions of quality and changes in pro-community attitudes. We find that while posts from local news pages are perceived to be of significantly higher quality than posts from local online groups, neither led to significant changes in pro-community attitudes during our study period. We discuss implications for the future study of local news in a changing media ecology.
- Storytelling Structures in Data Journalism: Introducing the Water Tower structure - Bahareh Heravi (University of Surrey) [Paper]
Reviewing the existing and long-established storytelling structures, this paper examines the use of the storytelling structures employed in data storytelling, specifically in the context of data journalism. For this, a large set of data stories from a variety of news outlets was collected, tagged and analysed. Accordingly, and reflecting on the results, the paper proposes a new storytelling structure for data storytelling, which addresses the unique requirements of this emerging area of study and practice, called the Water Tower structure. This proposed structure is an addition to the existing storytelling structures, and is specifically designed for and targeted at storytelling with data, with a particular focus on data journalism. While this paper is primarily focused on data storytelling in journalism, the contributions are believed to be of use and value to other domains such as Business.
|Data and/as the news
- Cataloging Algorithmic Decision Making in the U.S. Government - Grace Lee, Jasmine Sinchai, Daniel Trielli and Nicholas Diakopoulos (Northwestern University) [Paper]
Government use of algorithmic decision-making (ADM) systems is widespread and diverse, and holding these increasingly high-impact, often opaque government algorithms accountable presents a number of challenges. Some European governments have launched registries of ADM systems used in public services, and some transparency initiatives exist for algorithms in specific areas of the United States government; however, the U.S. lacks an overarching registry that catalogs algorithms in use for public-service delivery throughout the government. This paper conducts an inductive thematic analysis of over 700 government ADM systems cataloged by the Algorithm Tips database in an effort to describe the various ways government algorithms might be understood and inform downstream uses of such an algorithmic catalog. We describe the challenge of government algorithm accountability, the Algorithm Tips database and method for conducting a thematic analysis, and the themes of topics and issues, levels of sophistication, interfaces, and utilities of U.S. government algorithms that emerge. Through these themes, we contribute several different descriptions of government algorithm use across the U.S. and at federal, state, and local levels which can inform stakeholders such as journalists, members of civil society, or government policymakers.
- News as Data for Activists: a case study in feminicide counterdata production - Rahul Bhargava (Northeastern University), Harini Suresh (Data + Feminism Lab & CSAIL, MIT), Amelia Lee Doğan (Data + Feminism Lab, MIT), Wonyoung So (Data + Feminism Lab & DUSP, MIT), Helena Suárez Val (Feminicidio Uruguay & CIM Warwick), Silvana Fumega (ILDA) and Catherine D'Ignazio (Data + Feminism Lab & DUSP, MIT) [Paper]
News articles are an important source of data for recording and aggregating a range of social phenomena. In this paper, we ask if and how technology can support civil society activists who challenge asymmetrical power relations by producing counterdata—datasets missing from mainstream counting institutions. We consider a case study centered on activists who monitor feminicide, or the lethal outcome of gender-related violence, often using news as a main source to identify and compile databases of incidents. We describe a system that we collaboratively built with activists, aimed at relieving some of the emotional and time-intensive labor this work entails. The system discovers relevant news stories on multiple systems, classifies them based on machine learning models, clusters them into groups of stories about the same incident, and delivers regular email alerts to users. Currently, 26 groups across different geographical regions are using the system, and groups who broadly monitor feminicide report that they are regularly discovering new cases. We also reflect on the short-comings of the pilot system for groups with more specific, intersectional monitoring focuses, and the implications of biased narratives or under-reporting on the system’s design. This case study contributes a grounded example of computational journalism built in collaboration with, and in service of, activists working on critical human rights issues.
- Characterizing Social Movement Narratives in Online Communities: The 2021 Cuban Protests on Reddit - Brian Keith (Virginia Tech), Tanushree Mitra (University of Washington) and Chris North [Paper]
Social movements are dominated by storytelling, as narratives play a key role in how communities involved in these movements shape their identities. Thus, recognizing the accepted narratives of different communities is central to understanding social movements. In this context, journalists face the challenge of making sense of these emerging narratives in social media when they seek to report social protests. Thus, they would benefit from support tools that allow them to identify and explore such narratives. In this work, we propose a narrative extraction algorithm from social media that incorporates the concept of community acceptance. Using our method, we study the 2021 Cuban protests and characterize five relevant communities. The extracted narratives differ in both structure and content across communities. Our work has implications in the study of social movements, intelligence analysis, computational journalism, and misinformation research.