How much the different editions of the Wikipedia diverge in definition of a sensitive issue?

Can you perceive the cross-cultural transposition of the social phenomenon with a cross-linguistic analysis?

How much a semi-automated analysis on different linguistic versions of article is able to collect the cultural perceptions of controversy?

Abstract

Wikipedia is one of the main sources of information to which people access when they need a fast information (alexa.com). The encyclopedia is now available in 288 linguistic editions (May 2015), written and maintained by more than thirty-nine million registered users worldwide.
The collaborative creation of Wikipedia’s contents not only guarantees a large coverage of topics but also facilitates a quick update of the entries, according to their current state of development. But, at the same time, the different training of users who edits and the delicacy of some topics might make the creation process more difficult. To overcome the problem, the Wikipedia communities drew up a series of guidelines, including the invite to maintaining a neutral point of view (NPOV). Users are therefore required to produce articles that “represent quite proportionally and, as far as possible, without bias, all significant positions views collected from reliable sources”.

But what happens when a topic takes on a different meaning depending on the cultural background? Culture is a factor that strongly affects the construction of knowledge and plays an important role in critical issues profiling. The same neutrality is culturally subjective and can not be reduced to an universal definition: cultural differences influences the structure of articles, the contents contained in it and produces a uneven distribution of changes activity between the page’s sections, dictated by the interests of the community. The goal of the project is to verify how much of the cultural perception of a controversial issue can emerge using a semi-automated analysis to examine more language versions of the same article. The final output of the survey is a panoramic view of ideologies, priorities and behaviours own of different editions users which emerges in their linguistic definition of a particular social controversy.

Methodology

The goal of the research is to show how the controversy linked to a specific theme is reflected in the corresponding Wikipedia article and how much the cultural background of the participating users influences the production of content and the activities. In order to ascertain this hypothesis, I chose the theme Homosexuality because produces a controversy which exceeds the barriers of a single nation, with a continuous cultural development and negotiation and with a high media coverage. The language editions selection is based on two main factors: the valuation of Wikipedia article made by users and the degree of equality of rights for LGBT planned by the constitutions of the European countries who speaks those languages. This led to the collection of eight linguistic versions of the page hypothetically belonging to cultural communities with an uneven level of acceptance of the phenomenon: English, Italian, French, Portuguese, Bulgarian, Hungarian and Russian. The research is structured in five points who depict the various facets of the language communities’ behaviors found in the treatment of the article through a semi-automatic analysis of contents and informations. The result is a sequential overview of the global relevance of homosexuality and the relationship that the communities have with it. An overview built from the study of the development of the dispute in time, the users features, the semantic compatibility of discussions, self or global focus and the specific interests of linguistic communities.

Results

The thesis project determines how much a semi-automatic analysis can be effective to discover a possible connection between the various cultural approaches behind a dispute and its translation on Wikipedia. The platform is a functional and effective research context for this kind of research thanks to its division into language editions and for the structuring data that collects and provides for each page. The agnostic collection of information to compare and the condition of linguistic leveling recreated with the visualizations allow to provide an overview of the disputes on topic without prejudices and easily accessible by users without specific linguistic skills. The role of the designer is so indisputably necessary in order to permit a simultaneous cross-linguistic comparison of topics controversy from a set of complex data. The visualizations stand out the differences of communities’ approaches through a structured and accessible explication of several complementary aspects that together defines the controversy. From the investigation emerges a management of information about homosexuality linguistically dictated, not equivalent between editions, and a clear lack of homogeneity in users’ activity on the pages. Being a cross-linguistic analysis isn’t correct to say that the languages profiles obtained coincide with the positions of eight specific cultures. However, in some cases, the analysis revealed a clear connection between communities’ behaviors and cultural bias, especially for languages which are proper of just one country. A focus of activity on some specific aspects indicates an affinity of interests and a supposed cultural proximity of who writes the contents.