Reliability of User-Generated Data : the Case of Biographical Data in Wikipedia

Reliability of User-Generated Data : the Case of Biographical Data in Wikipedia

Robert Viseur, Reliability of User-Generated Data : the Case of Biographical Data in Wikipedia, OpenSym’14, August 27-29 2014, Berlin, Germany

Date: 27 août 2014

Publication: Communication scientifique 

Expertises:

Science des données 

A propos du projet: CE-IQS 

Abstract

Wikipedia is a collaborative multilingual encyclopedia launched in 2001. We already conducted a first research on the extraction of biographical data about personalities from Belgium in order to build a large database with biographical data. However, the question of the reliability of the data arises. In particular, in the case of Wikipedia, the data are generated by users and could be subject to errors. In consequence, we wanted to answer to the following question : are the data introduced in Wikipedia articles reliable ? Our research is organized in three sections. The first section provides a brief state of the art about the reliability of the user-generated data. A second section presents the methodology of our research. A third section will present the results. The error rates that were measured for the birthdate is low (0.75%), although it is higher than the 0.21% score that we observed for the baseline (reference sources). In a fourth section, the results are discussed.