arXiv, 1811.09393

Mengyu Chu*, Technical University of Munich
You Xie*, Technical University of Munich
Laura Leal-Taixe, Technical University of Munich
Nils Thuerey, Technical University of Munich

(* Similar contributions)


Adversarial training has been highly successful in the context of image super-resolution. It yields realistic and highly detailed results. Despite this success, current state-of-the-art methods for video super-resolution still favor simpler norms such as L2 over adversarial loss functions. This is caused by the fact that the averaging nature of direct vector norms as loss functions leads to temporal smoothness. The lack of spatial detail means temporal coherence is easily established. In our work, we instead propose an adversarial training for video super-resolution that leads to temporally coherent solutions without sacrificing spatial detail.

Our work focuses on novel loss formulations for video super-resolution, the power of which we demonstrate based on an established generator framework. In this way we show that temporal adversarial learning is the key for achieving photo-realistic and temporally coherent detail. Besides our novel spatio-temporal discriminator, we propose a novel Ping-Pong loss that can effectively remove temporal artifacts in recurrent networks without reducing perceptual quality. Quantifying the temporal coherence for video super-resolution tasks has also not been addressed previously. We propose a first set of metrics to evaluate the accuracy as well as the perceptual quality of the temporal evolution. A series of user studies also confirms the ranking achieved via these metrics. I.e., our method outperforms previous work by yielding more detailed images with natural temporal changes.