In recent years, the development and implementation of deep neural networks has accelerated the development of applications capable of very reliably recognizing different categories of images and videos.
These are mathematical models that have their roots in studies published more than half a century ago and that only in recent years, thanks to very high performance computers with low cost, have they developed rapidly, allowing their application in different real contexts, such as recognition of faces and human speech.
An application that is enjoying some success in the entertainment world, but which poses several problems for safety, is the creation of deepfakes, combination of 'deep learning' and 'fake'. This term refers to any creation obtained through techniques capable of superimposing images of a person (target) to those of another person (source) in a video in which the second person does or says certain things. In this way you get fake realistic videos in which, for example, an actor plays a part, but his face is realistically superimposed on the face of a famous person who has never uttered the phrases proclaimed by the actor nor was he in the context of the scene. More generally, with the term deepfakes it refers to synthetic contents obtained through tools of artificial intelligence.
Recently some articles have proposed an overview of the techniques for creating deepfakes and for their identification. Detection techniques are essential for identifying images and videos created specifically to misinform or, more generally, to deceive people. In the works "The Creation and Detection of Deepfakes: A Survey"1 and "Deep Learning for Deepfakes creation and Detection: a Survey"2, the authors explain recent trends in deepfakes manufacturing technologies, and possible uses in good and bad faith.
In addition to lawful uses in the field of film production or in machine translation for example, there are a whole series of illegal uses, in particular in the production of porn films or in the falsification of speeches, for the purpose of making easy money, influencing public opinion and elections, create panic, generate false trial evidence and so on.
The first attempt at the creation of deepfake it is traced back to a Reddit user who developed the application called 'FakeApp' using a model based on autoencode coupled to extract the main features from a facial image and reproduce them on another facial image.
Un autoencode is a neural network consisting of a encoder and one decoder, created for the purpose of extracting the main characteristics of a set of unlabeled data (encoder) and reconstructing the input data (decoder) starting from a previously created compact representation. A deepfake can be created using a encoder specialized on the face of a person and using the coding thus realized as input data for a decoder specialized on another person's face.
A second technology for producing 'deepfakes' is the use of 'Generative Adversarial Networks'. Also in this case we are dealing with neural networks whose purpose is to create realistic images that do not correspond to real people3.
The use of these technologies makes it increasingly difficult to distinguish between a real image or video (image or speech) and a modified one, creating serious problems in the fields of privacy, democratic life and national security.
In these publications, reference is also made to some cases of particular interest for the military world, in which satellite images modified for military purposes were generated, with details not present in the original.
While it is true that it is increasingly difficult to distinguish the deepfakes from reality, it is also true that technology comes to our aid. In their works, the authors survey the main analysis techniques used in surveying deepfakes, techniques that often make use, once again, of technologies of deep learning. Unfortunately, however, these detection techniques are very vulnerable, and a moderate effort is enough to modify the process of creating deepfakes so that these are no longer recognizable as such.
Nowadays, in many cases, video images are used in criminal trials, videos certified by forensic experts.
But how much can you trust what you see or hear in a video?
Less and less ... that's why it will be increasingly necessary to support IT experts (digital forensics) capable of employing and recognizing the use of deep learning to forensic experts. Unfortunately, even by doing so, in some cases the result may not be sufficient to ascertain the truth as it is not always easy or possible to explain how a technology works. deep learning to produce or locate deepfakes.
Therefore it is necessary to combine intelligence techniques and methodologies for the comparative verification of the context with the technological tools of image and video analysis.
Alessandro Rugolo, Giorgio Giacinto, Maurizio d'Amato
To learn more:
2 Nguyen, TT, Nguyen, QVH, Nguyen, CM, Nguyen, D., Thanh Nguyen, D., and Nahavandi, S., "Deep Learning for Deepfakes Creation and Detection: A Survey", arXiv e-prints, 2021, https://arxiv.org/abs/1909.11573v3