Dual Contrastive Loss and Attention for GANs

ICCV 2021


Ning Yu1,2      Guilin Liu3      Aysegul Dundar3,4      Andrew Tao3      Bryan Catanzaro3      Larry Davis1      Mario Fritz5
1. University of Maryland      2. Max Planck Institute for Informatics      3. NVIDIA      4. Bilkent University
5. CISPA Helmholtz Center for Information Security

Abstract


Generative Adversarial Networks (GANs) produce impressive results on unconditional image generation when powered with large-scale image datasets. Yet generated images are still easy to spot especially on datasets with high variance (e.g. bedroom, church). In this paper, we propose various improvements to further push the boundaries in image generation. Specifically, we propose a novel dual contrastive loss and show that, with this loss, discriminator learns more generalized and distinguishable representations to incentivize generation. In addition, we revisit attention and extensively experiment with different attention blocks in the generator. We find attention to be still an important module for successful image generation even though it was not used in the recent state-of-the-art models. Lastly, we study different attention architectures in the discriminator, and propose a reference attention mechanism. By combining the strengths of these remedies, we improve the compelling state-of-the-art Frechet Inception Distance (FID) by at least 17.5% on several benchmark datasets. We obtain even more significant improvements on compositional synthetic scenes (up to 47.5% in FID).

Results

Qualitative results (see more in the paper)

Attention maps (see more in the paper)

FID results

Video


Materials




Paper



Poster

Code

Press coverage


thejiangmen Academia News

Citation

@inproceedings{yu2021dual,
  author={Yu, Ning and Liu, Guilin and Dundar, Aysegul and Tao, Andrew and Catanzaro, Bryan and Davis, Larry and Fritz, Mario},
  title={Dual Contrastive Loss and Attention for GANs},
  booktitle = {IEEE International Conference on Computer Vision (ICCV)},
  year={2021}
}

Acknowledgement


We thank Tero Karras, Xun Huang, and Tobias Ritschel for constructive advice. Ning Yu was partially supported by Twitch Research Fellowship. This work was also partially supported by the DARPA SAIL-ON (W911NF2020009) program. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the DARPA.

Related Work


T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, T. Aila. Analyzing and improving the image quality of stylegan. CVPR 2020.
Comment: A state-of-the-art GAN baseline method that is used as our backbone.
A. Radford, J.W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger. Learning transferable visual models from natural language supervision. ICML 2021.
Comment: The classic contrastive learning technique that is used in our loss term.
H. Zhao, J. Jia, V. Koltun. Exploring self-attention for image recognition. CVPR 2020.
Comment: The state-of-the-art attention architecture that is used for our self attention and reference attention.
A. Brock, J. Donahue, K. Simonyan. Large scale GAN training for high fidelity natural image synthesis. ICLR 2019.
Comment: A state-of-the-art GAN baseline method that benefits from large model size and big data.
E. Schonfeld, B. Schiele, A. Khoreva. A u-net based discriminator for generative adversarial networks. CVPR 2020.
Comment: A state-of-the-art GAN baseline method that is built upon BigGAN and uses U-Net to enhance its discriminator.