In cybersecurity, there is a call for adaptive, accurate and efficient procedures to identifying performance shortcomings and security breaches. The increasing complexity of both Internet services and traffic determines a scenario that in many cases impedes the proper deployment of intrusion detection and prevention systems. Although it is a common practice to monitor network and applications activity, there is not a general methodology to codify and interpret the recorded events. Moreover, this lack of methodology somehow erodes the possibility of diagnosing whether event detection and recording is adequately performed. As a result, there is an urge to construct general codification and classification procedures to be applied on any type of security event in any activity log. This work is focused on defining such a method using the so-called normalized compression distance (NCD). NCD is parameter-free and can be applied to determine the distance between events expressed using strings. As a first step in the concretion of a methodology for the integral interpretation of security events, this work is devoted to the characterization of web logs. On the grounds of the NCD, we propose an anomaly-based procedure for identifying web attacks from web logs. Given a web query as stored in a security log, a NCD-based feature vector is created and classified using a support vector machine. The method is tested using the CSIC-2010 data set, and the results are analyzed with respect to similar proposals.
Gonzalo de la Torre-Abaitua, Luis F Lago-Fernández, David Arroyo
Logic Journal of the IGPL, Volume 28, Issue 4, August 2020, Pages 546–557