DaCENA (Data Context for News Articles) is a web application conceived, designed and developed by the Department of Informatics, Systems and Communication (DISCO) of the University of Milano – Biccoca and DensityDesign Research Lab.
The web application showcases a new approach to reading online news articles with the support of a data context built from interlinked facts available in the Web of Data. Given a source article, a set of facts that are estimated to be more interesting for the readers are extracted from the Web and presented using tailored information visualization methods and an interactive user interface. By looking at these facts, the reader access background factual knowledge with the advantage of being supported in the interpretation of the news content and being suggested connections to related topics to further explore.

Interlinked facts in the web of data

The Web has transformed into a large-scale repository of facts. Facts represent features of specific entities. For example the facts < Artur Mas I GavarropresidentOf Convergence and Union > < Convergence and Union type Political Party> < Artur Mas I Gavarro birthplace Catalonia> available in DBPedia – a data-based representation of Wikipedia content – tell that Artur Mas I Gavarro is president of the political party Convergence and Union and was born in Catalonia. Semantic associations are chains of facts that connect two entities. For example, “Catalonia is birth place of Artur Mas I Gavarro, who is the president of Convergence and Union (a party), which has Catalan nationalism as ideology” connects the two entities Catalonia and Catalan nationalism. Several semantic associations that mention a common entity can be interlinked, forming a network of interlinked associations, or, interlinked facts.

Data contenxt for a news articles as a network of interlinked facts

DaCENA uses a novel interactive user interface to presents to the reader a set of interlinked facts that describe semantic associations between the main topic of an article, represented by an entity, and several other entities that are mentioned in the article. The interlinked associations represent context data that readers can navigate through, thus exploring additional background knowledge relevant to the article. DaCENA makes this exploration interactive and strives to show only those semantic associations that are expected to be more interesting for the readers. To this end DaCENA order the associations by serendipity, which is estimated as a combination of two different factors: relevance, which captures how related an association is to the content of the article, and unexpectedness, which captures how an association is likely to be unknown to the reader.

Advantages for the reader

DaCENA aims to be a first steps towards data journalism based on the analysis of relational data. A news reader can leverage the explored data to better understand the context of the story told in the article. Consider the example of the article “A Threat to Spanish Democracy” published in the New York Times on 7th, November 2014, and discussing the issue of nationalism in the Catalonia region. By looking at the association “Catalonia is birth place of Artur Mas I Gavarro, who is the president of Convergence and Union (a party), which has Catalan nationalism as ideology”, the reader discovers the name of a prominent politician (currently leading the Catalonia government) and of his party, which were not mentioned in the article. In addition he/she learns that Convergence and Union is a nationalist party. A journalist may use DaCENA to have inspiration for a new story. For example, in other associations found by DaCENA, e.g. the ones between Catalonia and Autonomous countries of Spain, he/she can find out about other autonomous communities. These findings may inspire a new story about the political landscape behind nationalist ideologies in Spain and a comparative analysis of different political movements in favour of autonomy.

The user interface

The user interface of DaCena has been designed to offer the user an interactive environment to read the news article and visually explore the semantic associations simultaneously. The aim is to offer the user an innovative reading experience based on an exploration process characterized by the overview first, zoom and filter, details on demand browsing model typical of information visualization interfaces. The interface presents on the left side the news article formatted in order to guarantee a good readability of the text. The named entities found are highlighted in yellow in the article as the main entity, which is presented at the beginning of the section. On the right side we can find the interactive graph generated by the correlations between the main entity (the big yellow node), the other entities mentioned in the article (the small yellow nodes) and the entities that occur in the associations but not in the article (the grey nodes). Clicking on the entities, both through the visualization and the text on the left, the user can filter the network and explore in detail all the paths from the main entity to the selected one. In the upper part of the interface the user can access different parameters to filter or expand the graph.

The Demo

The demonstration showcases the concept and novel data exploration interface of DaCENA. Users can read several articles from within the interactive interface and personalize the data context view. They can dynamically change the number of displayed associations, filter out associations based on their length, and look associations between the main topic and one entity of interests in more details. In addition, users can dynamically tune serendipity to favor unexpectedness or relevance; their preferences will result in a quick adjustment of the shown graph.
Despite we developed DaCENA for the journalism domain, it can be virtually applied to any domain where it is useful to explore a (relational) data context related to a text of interest (for example, in the forensic or in the music domain). By exploring linked data with DaCENA in the context of reading tasks, we also experienced some limits in DBpedia, when this KB is challenged to provide interesting data for end-users. We believe that DaCENA can stimulate interesting discussions on topics such as data quality, data value, and content specialization in socio-political and economic domains.

DaCENA is an ongoing project and we need feedbacks and suggestions on how to improve it.
We are particularly interested in test users to involve in the project and possible collaborations with journalists other institutions. Feel free to contact us.

Team

The project has been conceived, designed and developed by the Department of Informatics, Systems and Communication (DISCO) of the University of Milano – Biccoca and the DensityDesign Research Lab.

Other fomer members and students involved in the project
Francesco Gariboldi
Davide Redaelli
Elena Nava
Valeria Gennari
Alessio Polidoro