Znak Politechniki Warszawskiej

Home » News »

WUT team develops an algorithm to convert videos into comic

Comixify has attracted great interest around the world; photo: Comixify

Comixify has attracted great interest around the world; photo: Comixify

Comixify automatically selects frames with the most interesting and essential content of an uploaded video and then arranges, fits into picture frames and converts such still images into graphics. This how comic pictures are done.

The project is a joint effort of Maciej Pęśko, Eng., Adam Svystun and Paweł Andruszkiewicz, Eng., students of the Faculty of Electronics and Information Technology supervised by the faculty of the Institute of Computer Science, Department of Computer Graphics: Prof. Przemysław Rokita, DSc, Eng. and Tomasz Trzciński, PhD, Eng.

Comixify is a product of the students’ diploma project efforts made since the beginning of the year. The concept derives from a combination of the students’ interests (in comics and style transfer using machine learning models) and the existing publications by the Department, including that on predicting popularity of online content [1].

How does Comixify work? Converting a video into a comic is a two-stage process of frame extraction and style transfer.

First, representative scenes of a recording are isolated. In order to do that, the scientists have developed a reinforced learning algorithm for intelligent video summarization [2]. With an added image quality estimating module [3] and the image popularity data [1], in addition to those video frames which are the most representative, Comixify is able to identify those which offer the greatest esthetic potential and have the greatest chance of popularity.

Once frames are retrieved, the style of the video is transferred to images, meaning that the scenes are rendered as comic-style images. To achieve that, the scientists are implementing a generative adversarial network (GAN) model [4]. GAN is a machine-learning technology based on two different neural networks: a generator and a discriminator. The algorithm is trained by being given data (such as images) that the generator uses to create new data instances. And the discriminator’s role is to check if such instances come from the training data set (true) or from a set which has been generated by the generator network (false). The process continues until the generator can create instances which are similar enough to the training set so that the discriminator cannot tell the difference any more.

Comics created from scenes of movies such as “Pulp Fiction” or “Star Wars Episode I: The Phantom Menace” can be checked out at the Comixify website. Anyone can test the tool developed by the team of the Warsaw University of Technology for themselves by uploading own video files (up to 50 MB) or by using the YouTube links provided. There are no restrictions on the video length at Comixify.

The paper describing the algorithm was published online on December 12, 2018 and immediately sparked a tidal wave of interest from Japan through Australia, India, France to the United States. At the same time, the Comixify website hit over 140,000 visits, with several thousand comics created, and the authors have received many communications from interested film studios and comic publishers from Europe and the US.

Our researchers are keen to carry on working on the project and expand it to include new functionalities: generation of new layouts and voice recognition to allow adding text to images. They are also looking for financing options to support further work to address the international interest.

Comics in their various forms (from books to movies) are now immensely popular around the world. For artists, this means facing both a growing demand for this form of art and increasingly high expectations of audiences. Therefore, the solution created at the WUT is likely to facilitate or even revolutionize the way video is turned into comic images.


[1] T. Trzcinski, P. Rokita. Predicting popularity of online videos using Support Vector Regression. IEEE Trans. Multimedia (TMM). Vol. 19, No. 11, p. 2561-2570, 2017.

[2] K. Zhou, Y. Qiao, T. Xiang. Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. AAAI, p. 7582–7589, 2018. 

[3] H. Talebi and P. Milanfar. NIMA: neural image assessment. IEEE Trans. Image Processing, 27(8): 3998–4011, 2018.

[4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. NIPS, p. 2672–2680, 2014.