Building news ranking and recommendation systems can be tough. Their results make their way into a number of products, but are most easily seen in News tabs or applications such as Google News, Bing News, and Apple News and their lists of recommended news articles.
What factors determine which articles get top billing?
Many news and information organizations would love to understand this. Some factors seem clear: at times, the location from where you conduct the search seems to matter. Other times, it’s the media outlet itself that seems to be important, though what websites actually count as more worthy can be controversial.
Part of the reason for this controversy has to do with the role of news and media for societies, especially democratic ones.
For example, we know now about the social media campaigns to sow disinformation and disgust among Americans by the Internet Research Agency in advance of the 2016 United States Elections, which amplified certain tweets and articles over others. This manipulation is concerning for a number of reasons, ranging in issues from one’s Access to Electoral Information to whether a Pluralistic and Balanced Media exists (see for example Obligations related to Media in The Carter Center’s Election Observation and Standards Database). Not having access to correct and balanced information can affect the ability of voters to make independent, unbiased, and uncoerced decisions. Yet according to Freedom House “domestic actors online interfered in 26 of 30 countries that held elections or referendums” last year.
But there is, however, so much news and information, and this is why news ranking and recommendation systems are important. They offer rule bound ways of surfacing information from thousands and thousands of works of information being generated constantly, and they theoretically work well if this is done in ways that are relevant and informative.
So given the needs of democracies and of machines at scale, what should those rules and considerations be?
Parameters and Scoring
Because of the Social Good aspect of this hackathon, successful projects are likely to come from interdisciplinary teams. Each team will choose a country that is not the United States, and define a list of news sites or article ranks for a specific scope (see below).
This should result in:
- A list of ranks: a spreadsheet of 50 to 100 articles or sites and their ordinal ranks.
2. A design document: an accompanying narrative (approximately two to five pages), which explains your algorithm design. This include how factors that informed the ranking, including their weights, such as:
a. a definition of news, and
b. when and how social and democratic aspects were taken into consideration.
You should also choose a specific scope for news ranking and recommendation — such as Breaking news, Science news, Politics — to apply your design towards, which needs to also define and describe the relationship of the scope to “Social Good.”
3. Work product: results from addressing either the front-end or back-end challenge
If your team chooses the front-end challenge, you will be submitting design modules (eg. UX considerations such as persona, wireframe treatments) that show how to present the news content in ways that are sensitive to the ranking and the factors defined above.
If your team chooses the back-end challenge, you will submit the results of your generated code (which reflect the design factors defined above) to demonstrate how they can inform news ranking and recommendation. This might include scraping news sites/articles for their content, querying related databases such as Datacommons.org Fact Checks (http://datacommons.org/factcheck) or expert related sites when relevant (eg. CDC), applying Natural Language Processing, creating labels for unsupervised machine learning processes, or beyond.
Overall, whether tackling the front- or back-end challenge, all teams should consider the challenges of translating their factors for machine readability, such as whether a specific label or marker or via existing annotations on the Internet that are provided in structured formats.
At the kickoff on March 27, teams will form and choose their countries. They will also be introduced to additional resources, such as news definitions that may be helpful to their approach.
Workspace for uploading the first two documents will be made available at GitHub. If scratch server space is needed for teams to do work, we’ll set that up during the kickoff as well.
The scoring rubric will be made available before the event on this site.
All attendees and participants are expected to adhere to relevant campus policies, such as those defining appropriate student conduct. This includes the content submitted for the project. For example, in March 2020, Georgia Institute of Technology’s campus policies and student conduct will apply: http://www.policylibrary.gatech.edu/student-life/student-conduct.
As with other hackathons, intellectual property for each project belongs to those who produce it, but note that as a hackathon, this doesn’t count for general idea sharing and pitching. You also should submit your own work, as well as not introduce malware or include inappropriate content. See these examples of practices to get a sense of expectations:
- March 2020, Atlanta – Judges and Event Details