Advancing the Use of Information Compression Distances in Authorship Attribution

Muñoz, S.P., Oliva, C., Lago-Fernández, L.F., Arroyo, D.
Spezzano, F., Amaral, A., Ceolin, D., Fazio, L., Serra, E. (eds) Disinformation in Open Online Media. MISDOOM 2022. Lecture Notes in Computer Science, vol 13545 . Springer, Cham.

Detecting unreliable information in social media is an open challenge, in part as a result of the difficulty to associate a piece of information to known and trustworthy actors. The identification of the origin of sources can help society deal with unverified, incomplete, or even false information. In this work we tackle the problem of associating a piece of information to a certain politician. The use of inaccurate information is of great relevance in the case of politicians, since it affects social perception and voting behavior. Moreover, misquotation can be weaponized to hinder adversary reputation. We consider the task of applying a compression-based metric to conduct authorship attribution in social media, namely in Twitter. In specific, we leverage the Normalized Compression Distance (NCD) to compare an author’s text with other authors’ texts. We show that this methodology performs well, obtaining 80.3% accuracy in a scenario with 6 different politicians.


This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 872855 (TRESCA project), from Grant PLEC2021-007681 (project XAI-DisInfodemics) funded by MCIN/AEI/ 10.13039/501100011033 and by European Union NextGeneration EU/PRTR, from Comunidad de Madrid (Spain) under the project CYNAMON (no. P2018/TCS-4566), cofunded with FSE and FEDER EU funds, and from Spanish projects MINECO/FEDER TIN2017-84452-R and PID2020-114867RB-I00