Improving LSTMs' under-performance in authorship attribution for short texts

Oliva. Christian; Palmero Muñoz, Santiago; Lago-Fernández, Luis F.; Arroyo Guardeño, David
Proceedings of the 2022 European Interdisciplinary Cybersecurity Conference

We present a novel approach for conducting authorship attribution over tweets using Long-Short Term Memory networks (LSTMs). Vanilla LSTMs use the last hidden state for prediction. Our strategy introduces a mechanism based on Max Pooling to process all the hidden states simultaneously, which helps the model to better detect authors’ stylometry. We obtain a 4% accuracy improvement with respect to vanilla LSTMs.


This project has received funding from the European Union’s Hori zon 2020 Research and Innovation Programme under grant agreement No. 872855 (TRESCA project), as well as from Comunidad de Madrid (Spain) under the project CYNAMON (no. P2018/TCS- 4566), cofunded with FSE and FEDER EU funds, Spanish Government under project MINECO/FEDER PID2020-114867RB-I0, and Grant PLEC2021-007681 (project XAI-DisInfodemics) funded by MCIN/AEI/ 10.13039/501100011033 and by European Union NextGeneration EU/PRTR.