GAN, Flow-based models, Diffusion models
% Small choose
$$
Proposition (Goodfellow et al., 2014) For \(G\) fixed, the optimal discriminator \(D\) is \[\begin{align*} D_G^*(\boldsymbol{x})=\frac{p_{\text {data }}(\boldsymbol{x})}{p_{\text {data }}(\boldsymbol{x})+p_g(\boldsymbol{x})} \end{align*}\]
Proof: The training criterion for the discriminator \(D\), given any generator \(G\), is to maximize the quantity \(V(G, D)\)
\[\begin{align*} \begin{aligned} V(G, D) & =\int_{\boldsymbol{x}} p_{\text {data }}(\boldsymbol{x}) \log (D(\boldsymbol{x})) d x+\int_{\boldsymbol{z}} p_{\boldsymbol{z}}(\boldsymbol{z}) \log (1-D(g(\boldsymbol{z}))) d z \\ & =\int_{\boldsymbol{x}} p_{\text {data }}(\boldsymbol{x}) \log (D(\boldsymbol{x}))+p_g(\boldsymbol{x}) \log (1-D(\boldsymbol{x})) d x \end{aligned} \end{align*}\]
For any \((a, b) \in \mathbb{R}^2 \backslash\{0,0\}\), the function \(y \rightarrow a \log (y)+b \log (1-y)\) achieves its maximum in \([0,1]\) at \(\frac{a}{a+b}\).
Using GAN/WGAN to learn a mixture of 8 Gaussians spread in a circle
Real NVP learns an invertible, stable, mapping between a data distribution \(\hat{p}_X\) and a latent distribution \(p_Z\) (typically a Gaussian).
The left panel is generated from the CelebA-HQ dataset, and the right panel is generated from the CIFAR-10 dataset.