Abstract
Video forgery has recently emerged as a global problem due to the development
of sophisticated and user-friendly video modification tools and software. This study
introduces an end-to-end deep learning architecture for detecting the fabricated object in a
video. The recent advancements in deep learning for semantic segmentation of images and
videos served as inspiration for this architecture. To distinguish fake objects from
background images, this research suggested a semantic segmentation technique. The
suggested architecture, which combines the U-net and VGG19 architectures based on
Convolutional Neural Networks (ConvNet), is capable of differentiating between a forged
object and its background, even though the model was trained on a small sample size of data
and decreased the number of channels in every network layer, which reduced the
computational complexity of the suggested approach without compromising performance.
On 10 videos, the chroma-key composition and splicing forgery methods were used to assess
how well the proposed architecture performed. In lieu of traditional classification metrics,
mean intersection over union (mIoU) was used to evaluate the performance of the proposed
method. According to the experiment, the training and validation sets for the proposed
method both scored 0.9343 for mIoU accuracy, which is the highest.
Keywords
video forgery; semantic segmentation; convolutional neural network; VGG19;
U-net