DeepFakes — Production & Detection using various Deep Learning Methodologies.

Shiv Kumar Ganesh
9 min readApr 18, 2021

The availability of a large amount of data and easy access to technology has created a revolution in the field of Machine Learning and AI. In this blog, we discuss a paper which talks about the various uses of such technology. We talk about various Generative Adversarial Networks and their application in manipulation and swapping people’s faces. We also talk about the various potential way of using such techniques as well as detecting such images/videos. We will also be showcasing the various results that are available in the paper. The four major areas of discussion in this post are as follows:-

  1. Entire Face Synthesis
  2. Identity Swap
  3. Attribute Manipulation
  4. Expression Swap

Entire Face Synthesis
StyleGans approach is a very powerful way to generate an entirely new image using powerful GAN. This method generates realistic images of persons and can be used in various industries like Video Games, 3-D modeling as well as the fashion industry. On the other hand, this could lead to misleading personas on the social network which can be misleading as well as involved in criminal activities that can go along undetected.

An example of Entire Face Synthesis

Manipulation Techniques and Datasets
In this technique, the author has considered 4 different databases the and all of them here are based on the same GAN Architecture, either ProGAN or StyleGAN. These fake images in the dataset can also be classified based on the GAN that generated them (Similar to the device that took the photograph.). GANS do leave a market/fingerprint when we get image generated and even the various types of GAN’s have their own unique presence. The various datasets that are analyzed are mentioned below:-

100K — Generated Images dataset has images generated using StyleGAN architecture which is an improved version of ProGAN. From the paper, we can infer that StyleGAN was the most popular approach used in generating fake images.

Here we also see another new tool/technique called GANprintR. Using GANprintR the iFakeFaceDB has presented tremendous challenges for advanced fake detectors. GANprintR is used on top of StyleGAN to remove the GAN fingerprint from the generated image. One example the author has presented in the paper is illustrated below:-

Fake image created by StyleGAN and its an improved version using GANprintR.

Manipulation Detection Methodologies

There are several approaches to identify such content and different evaluation metrics are used to get to these results. For example, some use AUC(Area Under the Curve) and others use EER(Equal Error Rate)

Analysis of the internal GAN pipeline for detecting real and fake images in one of the solutions proposed. As the colors produced by the camera image and the fake image are mostly different. Based on color as a feature, Linear Support Vector Machines (SVM), one can classify such images and the implementor achieved a 70% AUC as the best result.

Another approach called the FakeSpoter. This approach monitors neuron behavior detecting fake faces layer-by-layer. The layer-by-layer neuron activation pattern captures the most minute features that are important for facial manipulation and FakeSpoter detects such changes. The author of this paper was able to achieve 84.7% fake detection accuracy using the FaceNet model.

A recent study proposed a fake detection system using convolutional traces and feature extraction using Expectation Maximization. k-Nearest Neighbours(k-NN), SVM(SVM), and LDA(LDA) were used for final detection with an accuracy of 99.81%.

Many other approaches have been utilized by various papers that talk about detecting the fake image by the methods mentioned below:-

  1. Detecting special fingerprint inserted by GAN architectures.
  2. Detection of the fake image using pixel co-occurrence matrix and Convolutional Neural Net(CNN).
  3. For new types of GAN and images that generate multi-task incremental learning and detection methods are also being developed.
  4. An Attention-Based mechanism has also been tried out to improve the training of the detection system.

Below mentioned are the comparison of all these detection methods and the various accuracy score they attained.

Identity Swap(Deep Fakes)
Identity swap technically swaps the face of an individual with the face of someone else. This has various utilities in the movie industry as well as in the education sector. In the wrong hands, it can have really bad usage for creating a hoax, misleading content, and even generating fake pornographic content.

An example of Identity Swap

Manipulation Techniques and Datasets

The datasets that the author has taken into consideration are as follows:-

We can see from the above dataset that both the video and images are subjected to such manipulation. These datasets contain both the real and the fact videos and were used to conduct this study.

Let's discuss the methodologies to generate such videos with the swap.

  1. The first mechanism mentioned in the paper talks about the GAN-Based Face-Swapping algorithm. CycleGAN based GAN is used and the weights of FaceNet are being used for this. So proper face alignment along with the allocation of the features, Multi-Task CNN is used. This approach also considers the Kalman filter to smoothen the bounding box position that leads to the elimination of jitters when the face is swapped in a video.
  2. Another approach is the FaceSwap approach that consists of facial alignment using Gauss-Newton optimization and image blending. The DeepFake approach that is also mentioned uses autoencoders and shared encoders. These encoders and autoencoders train to reconstruct training images of the source.

As we look into the approaches we also encountered the two different generations of Identity Swaps that came into existence.

The first generation had:-

  1. Low-quality synthesized faces
  2. The color contrast between the synthesized fake mask
  3. Visible boundaries of the fake mask
  4. Some strange artifacts appeared between frames.

The second generation seems to be a massive improvement on top of this. The image generated to contrast the 1st and the 2nd generation images is as follows:-

Image generated from 1st generation Identity Swap
Image generated from 2nd generation Identity Swap

Manipulation Detection Methodologies

Several methodologies are listed below in the tabular format but we will be looking into a few of the most influential ones.

The first study is majorly focused on Audio-Visual artifacts. This approach is based on the inconsistencies between the lip movements and the audio speech. These variations can be easily found in the image-based system often used in biometric solutions. For the first case, Mel Frequency Cepstral Coefficients(MFCCs) were used as the audio feature and distance between mouth landmarks as visual features. Dimensionality reduction was done using PCA. Finally, RNN based LSTM model was used to detect fake and real videos.

Head movement and Facial Expression based fake detection systems were also proposed. 3-D head poses when estimated from the face image also reveals a lot of information about the errors that are introduced by the DeepFakes.

The other methods that were utilized for inferring the Fake Images were as follows:-

  1. The difference in head poses being classified using SVM for final classification.
  2. In another approach, the author talks about the detection system based on both facial expressions and head movements. OpenFace2 toolkit was considered for obtaining the intensity and occurrence for 18 different facial action units related to the movement of the facial muscles. Here as well authors considered SVM for final scores. This approach produced a 96.3% AUC.
  3. Eye blinking is another way of studying fake videos. This has been proposed by authors and also proposed an algorithm called DeepVision to analyze changes in blinking patterns. Their approach was based on Fast-HyperFace and Eye-Aspect-Ratio to detect the face and obtain the eye aspect ratio. Blinking count gave us the period, that was extracted to detect whether the video was fake or real.

Below are the results from the various detection methods:

Attribute Manipulation
Attribute Manipulation has played a major role in the fashion and marketing industry. This process helps in manipulating various physical facial features to make them better or worse. FaceApp is one of the most famous apps of the play store as a consumer end product. This technique is used in changing the hair color adding a bit of makeup as well. Industries dealing with cosmetics are also using these to give a live trial to the customers.

An example of Attribute Manipulation

Manipulation Techniques and Datasets

The first method described is called Invertible Conditional GAN(IcGAN). This provides accurate results for any attribute manipulation. The drawback is serious changes to the facial identity of the person. Even though the proposed encoder-decoder architecture is trained to reconstruct images by disentangling the silent information of the image and attribute values. The generated images do lack details and there are noticeable distortions.

StarGAN was proposed as an enhanced approach for the above challenges. StarGAN proposed a clever approach for the image to image translation. StarGAN helped in producing good results when compared to its predecessors. Even though it was better it still produced some unwanted changes to the color and tone of the skin.

attGAN was another approach that removes a strict attribute-independent constraint from the image’s latent representation. It just applies the attribute-classification constraint to the generated image. AttGAN provides realistic manipulation for various attributes.

Recently seen new approach STGAN has outperformed the state of art in attribute manipulation. It has surpassed various other existing models.

Manipulation Detection Methodologies

The various methodologies proposed to detect such images are listed below:-

  1. Analysis of internal GAN pipeline to detect different artifacts between real and manipulated images
  2. Detection systems can be developed using CNN and a combination of pixels.
  3. RBM(Restricted Boltzmann Machine) based system for detection of digital retouching of face images in order to learn discriminative features to classify original and retouched images is another mechanism.
  4. Many deep learning methods along with SVM were also proposed in the paper in order to classify such data.

As we conclude the section we see that deep learning techniques are producing almost 100% accuracy while detecting such image manipulation. We can see these results from the table mentioned below.

Expression Swap
This technique can be used to manipulate or add the facial expression of one person to another. One can see various expressions that are exchanged and utilized for various purposes. Sometimes for fun other times for dangerous consequences.

An example of Expression Swap

Manipulation Techniques and Datasets

The initial approach in producing such effects was carried out by using manual keyframe selection. The first few frames of a video are used to get the temporal face identity and then track expressions in the remaining frame. Fake videos are then generated using these data and tweaking the various available parameter.

Another method talks about using NeuralTextures, a rendering approach that uses original video data to learn a neural texture for the target person. In this approach, only the facial expression corresponding to the mouth was modified.

Some of the existing GAN-based approaches to add emotion and expression to images are StarGAN, InterFaceGAN, STGAN, and AttGAN.

Manipulation Detection Methodologies

The various methodologies for detecting the expression manipulation are as follows:-

  1. Most of the studies have been done on the visual data available from videos. Mostly fake videos. The initial study was focused on missing reflections, eye color, etc. Approaches are based on mesoscopic and steganalysis.
  2. The above tests achieved high results with especially raw videos.
  3. Deep learning approaches based on 3DCNN were studies and this helped in analysis considering spatial and motion information.
  4. I3D and 3DResNet also were able to detect such videos and images with high precision.

Below mentioned Table can show you the various results achieved using various methodologies:-

References:

[1] DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection

[2 ]All of the above images and data is being taken from the paper mentioned above.

--

--

Shiv Kumar Ganesh

Interested in friends and am a Web Developer. Design Websites and Web Solutions in major Platforms, SEO Consultant