How effective are blacklists or whitelists in guiding readers to reliable sources of information? What criteria and methods should we use to construct and share these lists? And how do we ensure that the lists themselves remain reliable over time?
As concern about the spread of misinformation and disinformation online has continued to increase, so too have efforts to combat it. While there are many approaches to the problem, whitelists and blacklists have emerged as a key tactic.
In this context, and due to their design and ubiquity, blacklists and whitelists have become one area of focus for NewsQ’s research. The NewsQ team has executed three strategies to approach this area of research:
- We’ve developed a set of potential signals of credibility based on existing whitelists and blacklists of news and academic sources.
- We’re reviewing research on the efficacy of these lists.
- We’ve brought together journalists and scholars to discuss, in person, what the burgeoning black- and whitelist industry should look like and think about as it grows.
At the recent November 2019 “WikiCredCon” conference held at MIT, NewsQ organized a panel of researchers with firsthand experience in building and exposing such lists. The goal of the panel was to understand state of research on the efficacy of black- and whitelists. The group included Melissa Zimdars, assistant professor of communication and media at Merrimack College, Benjamin T. Decker, lead analyst at the Global Disinformation Index (GDI), and Ivonne Lujano, Latin America ambassador for the Directory of Open Access Journals.
Decker and Zimdars were quick to point out how easily these black- and whitelists can be weaponized. A possible answer to this challenge is to build these lists for private, internal use by those who were well equipped to understand the purpose and limitations of them—perhaps, in the long run, safer. But the suggestion that lists should therefore be built for private use for the public web raises its own set of concerns.
Is there a middle road?
What Are ‘Whitelists’ and ‘Blacklists’ in the Context of News Reliability?
Definitions of ‘whitelists’ and ‘blacklists’ can vary. So, for the purposes of the NewsQ project, lists that are compiled by humans or AI and that mainly address the overall reliability of news sources are considered to be whitelists and blacklists.
For the purposes of the NewsQ project, lists that are compiled by humans or AI and that mainly address the overall reliability of news sources are considered to be whitelists and blacklists.
So, a ‘whitelist’ is a shortlist of sources that have been deemed to be reliable or credible, while a ‘blacklist’ is a shortlist of sources that have been deemed to be unreliable.
Can We Label a Source as ‘Good’ or ‘Bad’?
It is understandable that, in response to a proliferation of misinformation and disinformation online, one response from both public and private actors has been to create lists of credible or unreliable information sources. These lists are an attempt to either quantify or qualify what makes a source usable, or to label a source, story, or factual assertion as either good or bad. For example, a related effort is the attempt to create “nutrition labels” for news.
While considerable effort has been made to assess the credibility of information, the following questions have largely remained unexplored:
- How effective are blacklists or whitelists in guiding readers to reliable sources of information?
- What criteria and methods should we use to construct and share these lists?
- How do we ensure that the lists themselves remain reliable over time?
At the heart of these questions lies the fundamental challenge of how to classify sources as credible or unreliable—of where and how to draw that line.
Little Research About How Well White- and Blacklists Work
As the NewsQ team moved forward, we evaluated relevant research produced by academic sources such as PeerJ, as well as by journalistic sources such as the Global Disinformation Index [1]. This helped us to frame our thinking about the challenges and benefits of using lists classify sources. What became clear through both the selection process, and the initial review of literature on this topic, is that in-depth research on black- and whitelists in news realms seems to be lacking. However, analogous research in realms related to spam, botnets, and academic articles suggests that there are a number of challenges to consider [2].
While making lists has been a common response to recent incarnations of problematic information online, we need a greater empirical understanding of their efficacy when it comes to news recommendations, let alone of how to build them in a responsible manner (e.g. the introduction of “false positives”) and according to a sound methodology.
Values Are Inherent When Creating a List
Implicit in the criteria used to create these lists is, always, a set of values. For instance, a list creator might decide that political slant is the most important criterion in determining a source’s quality. Other criteria might include the separation of news from opinion, or the emotional tone of the writing on the site. How an outlet fares against this criterion will determine whether it is whitelisted or blacklisted.
The creator, in this instance, is making many decisions via the creation of this criterion alone. List creators are, first and foremost, making the subjective determination of what an appropriate context for an outlet is which, depending on their approach, may incorporate more rather than less bias.
Based on these subjective measurements, an outlet may be whitelisted and marked with a stamp of approval as credible, or blacklisted, and marked as untrustworthy, which may guide people away from reading it or greater, if taken into account in news algorithms.
Lists Have Limits, and Need Context
The benefits of these lists—if built and shared well—could be tremendous. Black- and whitelists have the potential to guide a greater number of people towards credible sources, while also encouraging them to become more literate in news and media.
But the potential downsides of these lists are just as great:
- Biases of all kinds can easily be built into the criteria
- Creators with bad intentions can create blacklists of what are in fact good sources and vice versa
- Public lists can easily be weaponized to criticize the illegitimacy of the creators
- Misunderstandings about the original purpose of a list can lead to misuse, such as censorship
And, fundamentally, the very nature of these lists — perhaps curated with care — tends to put the power of determining what is good or bad news in the hands of few people.
This puts creators of black- and whitelists in a potentially powerful position. And while the impulse to stop the spread of inaccurate information as soon as possible is in some ways a good one, it is also critical for researchers, journalists and scholars who care about quality information to consider the values and methods that should guide the construction and use of such lists.
What’s Next?
At NewsQ, while we think about credibility in news from many angles, we are especially focused on what units of information can function as signals of credibility which, in turn, combined with a great variety of other signals, may indicate to an algorithm what is and is not “quality news.” The inclusion of a news outlet on one whitelist or blacklist is one potential signal of credibility.
Coming away from this, we have several considerations that will continue to guide our work. We’ll continue to explore how helpful black- and whitelists are, in the context of news and information. And when using such lists, we need to develop guidance for how to use them, based on an understanding of the list context, such as its purpose and its creators.
With this in mind, we’ve started with a few points for consideration below.
Things to Consider When Using Black-/Whitelists for News
- Who are the curators of the list?
- When was it made?
- What is its intended purpose and who is the audience?
- What is the criteria that the list uses to determine the inclusion of a source? To what degree are aspects of the criteria subjective, and how do we account for that?
- Through what mechanisms do the creators build and update the list? Is it created through original research by the creators? Is it fact checked? Does it get updated?
- What are strengths of the list? What are its limitations?
Further reading
[1] Strinzel, Michaela, Anna Severin, Katrin Milzow, and Matthias Egger. “‘Blacklists’ and ‘Whitelists’ to Tackle Predatory Publishing : A Cross-Sectional Comparison and Thematic Analysis.” PeerJ Inc., February 13, 2019. https://doi.org/10.7287/peerj.preprints.27532v1. Mousavizadeh, Alexandra, and Santhosh Srinivasan. “Designing the Index: A Review of Good Practices.” Global Disinformation Index, 2019, https://disinformationindex.org/wp-content/uploads/2019/05/GDI_Designing-The-Index-Report_Screen_AW.pdf.
[2] See for example: Erickson, David, Martin Casado, and Nick McKeown. 2008. “The Effectiveness of Whitelisting: A User-Study.” In CEAS 2008 – The Fifth Conference on Email and Anti-Spam. Mountain View, CA; Ma, Justin, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. “Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs.” In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1245–1254. KDD ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1557019.1557153; Melis, Luca, Apostolos Pyrgelis, and Emiliano De Cristofaro. 2019. “On Collaborative Predictive Blacklisting.” SIGCOMM Comput. Commun. Rev. 48 (5): 9–20. https://doi.org/10.1145/3310165.3310168. West, Andrew G., and Insup Lee. 2011. “Towards the Effective Temporal Association Mining of Spam Blacklists.” In Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, 73–82. CEAS ’11. New York, NY, USA: ACM. https://doi.org/10.1145/2030376.2030385. Wu, Jian, Pradeep Teregowda, Juan Pablo Fernández Ramírez, Prasenjit Mitra, Shuyi Zheng, and C. Lee Giles. 2012. “The Evolution of a Crawling Strategy for an Academic Document Search Engine: Whitelists and Blacklists.” In Proceedings of the 4th Annual ACM Web Science Conference, 340–343. WebSci ’12. New York, NY, USA: ACM. https://doi.org/10.1145/2380718.2380762.