Improving LSTMs' under-performance in authorship attribution for short texts

Oliva. Christian; Palmero Muñoz, Santiago; Lago-Fernández, Luis F.; Arroyo Guardeño, David

Proceedings of the 2022 European Interdisciplinary Cybersecurity Conference

http://hdl.handle.net/10261/268091

We present a novel approach for conducting authorship attribution over tweets using Long-Short Term Memory networks (LSTMs). Vanilla LSTMs use the last hidden state for prediction. Our strategy introduces a mechanism based on Max Pooling to process all the hidden states simultaneously, which helps the model to better detect authors’ stylometry. We obtain a 4% accuracy improvement with respect to vanilla LSTMs.

ACKNOWLEDGEMENTS

This project has received funding from the European Union’s Hori zon 2020 Research and Innovation Programme under grant agreement No. 872855 (TRESCA project), as well as from Comunidad de Madrid (Spain) under the project CYNAMON (no. P2018/TCS- 4566), cofunded with FSE and FEDER EU funds, Spanish Government under project MINECO/FEDER PID2020-114867RB-I0, and Grant PLEC2021-007681 (project XAI-DisInfodemics) funded by MCIN/AEI/ 10.13039/501100011033 and by European Union NextGeneration EU/PRTR.

GiCSI

Laboratorios

Laboratorio de Acústica
Laboratorio de Metrología Ultrasónica Médica (LMUM)
Laboratorio de Comunicaciones Cuánticas
Laboratory for International Collaboration in Advanced Biophotonics Imaging

Improving LSTMs' under-performance in authorship attribution for short texts

proyecto/s relacionado/s

Acoustics and Non Destructive Evaluation (DAEND)

Information and Communication Technologies (TIC)

Sensors and Ultrasonic Systems (DSSU)

Laboratorios