How Might We Detect Who Is and Who Is Not Doing Journalism Online?

Exploring the question of whether journalism ethics data is helpful in boundary detection for journalistic behavior on US websites 

The Markkula Center for Applied Ethics will use NewsQ signals data to explore whether journalism ethics data can inform behavioral signals on journalism for platforms as well as the news industry. Many platforms firms use the term “inauthentic behavior” as one way to identify bad actors. This project examines whether journalism ethics could be a proxy for authentic journalistic behavior and if so to what degree can it be used to classify boundaries. 

One of the hardest problems for platforms when sorting news and news-like content is to identify whether another publisher or author is, in fact, a journalistic actor or is acting as a journalist does or should. Compounding the challenge for platforms, classifying online news content is complex because the boundaries of journalism are not well-defined and continually contested in democracy. Some of the debates are old (for example, objective news versus advocacy), whereas discussions about what is journalism and what is not are new and the result of the Internet and the influence of social media. In particular, the problem of boundaries has complicated and influenced the algorithmic approach to news quality scoring and distribution. 

As a research partner for the NewsQ signals initiative, we at the Markkula Center for Applied Ethics are proposing to test a hypothesis that ethics in journalism could be a helpful marker for identifying and delineating journalistic boundaries. Between January and June 2020, the Markkula Center plans to use a journalism ethics vocabulary on sourcing, diversity, and inclusion to curate new datasets for news-like U.S. websites, and research the following question: 

How might we detect the boundaries of journalistic (and hence non-journalistic) behavior of websites? Can journalism ethics-lensed data retrieved from websites help and if so, how reliably?

What are journalism ethics and how are these ethics connected to research on behavior? 

Journalism ethics concerns itself with the right and wrong of journalistic work and decisions, usually undertaken by writers, reporters, photographers, editors, columnists, producers, publishers and so forth. In the United States, many news organizations and journalists acknowledge allegiance to a code of ethics built around seeking the truth, minimizing harm, acting independently and being accountable and transparent. More recently however, a debate on new journalistic norms has emerged where there is a shift to questions of diversity and inclusion (or lack of it), bias, amplification, false equivalence, and more. 

When journalists and news organizations strive to consistently apply ethical considerations to their work, evidence of this application is likely to manifest in their work online. Can such behavior be identified using computational and hybrid approaches? If so are they markers for journalism itself? There are implications for journalism’s boundaries here because of the possibility that ethical routines are usually deeper in the system and their manifestations may be harder to game. As a result, an absence of specific journalism ethics routines in non-journalistic actors producing news-like content may be discernible through its corresponding lack of manifestation. Or, the degree and pattern of manifestation of this behavior may be different enough when compared to journalistic organizations, that it is detectable. 

Exploring the patterns in the data

As a starting point for detecting the boundaries of journalistic (and non-journalistic) behavior, the Markkula team has selected a vocabulary around sourcing — one of the most critical routines in journalism — for new data from websites could be retrieved. These data would represent our initial “ethics features.” We will do two types of pattern explorations on the ethics features: 

  1. Are there any clustering or other patterns around our ethics feature-computations for the websites in the NewsQ signals dataset? If so what are they, and do they map to whether the organizations are journalism entities or not? 
  2. In conjunction with NewsQ’s existing signals data on around 12,500 US sites, does the enriched data show signs that some features are dependent variables on other features? Particularly do the ethics features predict any of the existing NewsQ signals better, or conversely do any of the NewsQ signals correlate or predict the ethics features? 

Next, in our analysis of (1) and (2) we will be looking to see if our approach leads to reliable signals of journalistic or non-journalistic behavior at all.  In the process, we want to if this initial approach to behavior-based signal data may inform the “What is News” research interest at the NewsQA initiative.

Use for the news industry and the public 

Independent of the boundary detection problem that tries to separate non-journalistic and journalistic behavior, the by-product of the Markkula Center’s work with NewsQ will be a new dataset for US-based news organizations available for further research into signals. This dataset will include ethics features that may be useful even as a comparison between journalism organizations, both mainstream media, and local news. In line with this, we expect the following areas may also receive input and generate momentum:

  1. We will be able to review whether the ethics features would be useful for norms-setting conversations at US news industry convenings. 
  2. Likewise, whether convened publics from diverse backgrounds may find it informative to draw connections between journalism quality and ethics.

Exploring limitations

We hope to have a sense by the end of this initial project whether journalism ethics-based computational approaches have limitations in their applicability for unorganized/less organized (long-tail) journalistic actors compared to mainline organizations or vice versa. 

Next Steps

 After our research concludes, we will issue a report answering the question of whether journalism ethics data is helpful in boundary detection for journalistic behavior on American websites. 

If there are promising indications, we will explain how so and what are the next steps will be. We will also include opportunities for other types of ethical routines in journalism beyond source diversity and whether those routines may afford a data-curation approach for further boundary analysis. 

Subramaniam Vincent is Director of Journalism and Media Ethics at the Markkula Center for Applied Ethics. His interests are in developing tools and frameworks to help advance new norms in journalism practice, ethical news product design and new vocabulary and signals to help the public process and demand ethical media.  The Markkula Center for Applied Ethics is one of the leading applied ethics centers in the world. It provides applied ethics resources and frameworks for ethical decision-making in nine different disciplines, including journalism and media ethics, internet ethics, and technology ethics.