Skip to main content

Main navigation

  • About ITEFI
  • Research
  • Formación y empleo
  • OpenLab
  • Servicios científico técnicos
  • Staff Directory

An Improved Bytewise Approximate Matching Algorithm Suitable for Files of Dissimilar Sizes

approximate matching
context-triggered piecewise hashing
edit distance
fuzzy hashing
LZJD
multi-thread programming
sdhash
signatures
similarity detection
ssdeep
Víctor Gayoso Martínez, Fernando Hernández-Álvarez, Luis Hernández Encinas
Mathematics 2020, 8(4), 503
https://www.mdpi.com/2227-7390/8/4/503

The goal of digital forensics is to recover and investigate pieces of data found on digital devices, analysing in the process their relationship with other fragments of data from the same device or from different ones. Approximate matching functions, also called similarity preserving or fuzzy hashing functions, try to achieve that goal by comparing files and determining their resemblance. In this regard, ssdeep, sdhash, and LZJD are nowadays some of the best-known functions dealing with this problem. However, even though those applications are useful and trustworthy, they also have important limitations (mainly, the inability to compare files of very different sizes in the case of ssdeep and LZJD, the excessive size of sdhash and LZJD signatures, and the occasional scarce relationship between the comparison score obtained and the actual content of the files when using the three applications). In this article, we propose a new signature generation procedure and an algorithm for comparing two files through their digital signatures. Although our design is based on ssdeep, it improves some of its limitations and satisfies the requirements that approximate matching applications should fulfil. Through a set of ad-hoc and standard tests based on the FRASH framework, it is possible to state that the proposed algorithm presents remarkable overall detection strengths and is suitable for comparing files of very different sizes. A full description of the multi-thread implementation of the algorithm is included, along with all the tests employed for comparing this proposal with ssdeep, sdhash, and LZJD.

Funding

This work was supported in part by the Ministerio de Economía, Industria y Competitividad (MINECO), in part by the Agencia Estatal de Investigación (AEI), in part by the Fondo Europeo de Desarrollo Regional (FEDER, UE) under Project COPCIS, Grant TIN2017-84844-C2-1-R, and in part by the Comunidad de Madrid (Spain) under Project reference P2018/TCS-4566-CM (CYNAMON), also cofunded by European Union FEDER funds. Víctor Gayoso Martínez would like to thank CSIC Project CASP2/201850E114 for its support.

GiCSI

proyecto/s relacionado/s

  • Criptosistemas Avanzados y Seguros para la Protección de la Privacidad. CASP2
    Proyectos intramurales (CSIC)
Acoustics and Non Destructive Evaluation (DAEND)
  • Environmental Acoustics (GAA)
  • G Carma: Materials Characterization by Non Destructive Evaluation
  • ULAB, Ultrasounds for Liquid Analysis and Bioengineering
Information and Communication Technologies (TIC)
  • Cybersecurity and Privacy Protection Research Group (GiCP)
  • Research group on Cryptology and Information Security (GiCSI)
    • Quantum Communications Laboratory (LCQE)
  • Multichannel Ultrasonic Signal Processing Group (MUSP)
Sensors and Ultrasonic Systems (DSSU)
  • Ultrasonic Systems and Technologies (USTG)
  • Nanosensors and Smart Systems (NoySi)
  • Ultrasonic Resonators for cavitation and micromanipulation (RESULT)
  • Advanced Sensor Technology (SENSAVAN)
  • Quantum Electronics (QE)
Laboratorios
  • Laboratorio de Acústica
  • Laboratorio de Metrología Ultrasónica Médica (LMUM)
  • Laboratorio de Comunicaciones Cuánticas
  • Laboratory for International Collaboration in Advanced Biophotonics Imaging

Instituto de Tecnologías Físicas y de la Información Leonardo Torres Quevedo  - ITEFI
C/ Serrano, 144. 28006 - Madrid • Tel.: (+34) 91 561 88 06  Contacto  •  Intranet
EDIFICIO PARCIALMENTE ACCESIBLE POR PERSONAS CON MOVILIDAD REDUCIDA