Focusing on temporally coherent detail synthesis, we present a range of new and existing research project that all employ spatio-temporal self-supervision with GANs. This concept is beneficial for a variety of challenging tasks: from video translations, to super-resolution for games, videos and for fluid flow effects (i.e. Navier-Stokes solutions).

  • Video translation for unpaired data from different domains (Ours is shown as TecoGAN):

  • Game super-resolution using strongly aliased input (Ours is shown as DRR):

  • For video super-resolution, the TecoGAN model (Ours) yields coherent and realistic details:

  • We also proposed an algorithm for 3D fluid super-resolution with a factor of eight in each dimension:

In all tasks, we found that temporal adversarial learning is key to achieving temporally coherent solutions without sacrificing spatial detail. Compared to adding traditional temporal losses (e.g., L1 or L2 distances of warped frames) to normal spatial GANs, spatio-temporal discriminators are able to  deal with more challenging learning objectives such as sharp detail over time.

Besides spatio-temporal GANs, we have developed different technologies to tackle individual challenges across these tasks. In unpaired video translation, we use curriculum learning for discriminators to achieve better adversarial equilibriums. For strongly aliased renderings in games, we propose depth-recurrent residual connections to learn stable temporal states. In video super-resolution, a bi-directional Ping-Pong loss is proposed to improve long-term temporal coherence. When processing large volumetric fluid data, a multi-pass GAN is used to break-down the data relationships from 3D+t to lower dimensions. Preprints of papers and codes will be available soon.

Further reading:
Multi-Pass GAN