projects
/
tex.git
/ commitdiff
commit
grep
author
committer
pickaxe
?
search:
re
summary
|
shortlog
|
log
|
commit
| commitdiff |
tree
raw
|
patch
|
inline
| side by side (parent:
05c0721
)
Update.
author
François Fleuret
<francois@fleuret.org>
Wed, 28 Feb 2024 07:19:50 +0000
(08:19 +0100)
committer
François Fleuret
<francois@fleuret.org>
Wed, 28 Feb 2024 07:19:50 +0000
(08:19 +0100)
elbo.tex
patch
|
blob
|
history
diff --git
a/elbo.tex
b/elbo.tex
index
fe91565
..
4c6cb24
100644
(file)
--- a/
elbo.tex
+++ b/
elbo.tex
@@
-76,24
+76,25
@@
\setlength{\abovedisplayshortskip}{2ex}
\setlength{\belowdisplayshortskip}{2ex}
\setlength{\abovedisplayshortskip}{2ex}
\setlength{\belowdisplayshortskip}{2ex}
-\vspace*{-
4
ex}
+\vspace*{-
3
ex}
\begin{center}
{\Large The Evidence Lower Bound}
\begin{center}
{\Large The Evidence Lower Bound}
-\vspace*{
1
ex}
+\vspace*{
2
ex}
Fran\c cois Fleuret
Fran\c cois Fleuret
+%% \vspace*{2ex}
+
\today
\today
-\vspace*{-1ex}
+
%%
\vspace*{-1ex}
\end{center}
\end{center}
-Given i.i.d training samples $x_1, \dots, x_N$ that follows an unknown
-distribution $\mu_X$, we want to fit a model $p_\theta(x,z)$ to it,
-maximizing
+Given i.i.d training samples $x_1, \dots, x_N$ we want to fit a model
+$p_\theta(x,z)$ to it, maximizing
%
\[
\sum_n \log \, p_\theta(x_n).
%
\[
\sum_n \log \, p_\theta(x_n).
@@
-134,6
+135,8
@@
since this maximization pushes that KL term down, it also aligns
$p_\theta(z \mid x_n)$ and $q(z)$, and we may get a worse
$p_\theta(x_n)$ to bring $p_\theta(z \mid x_n)$ closer to $q(z)$.
$p_\theta(z \mid x_n)$ and $q(z)$, and we may get a worse
$p_\theta(x_n)$ to bring $p_\theta(z \mid x_n)$ closer to $q(z)$.
+\medskip
+
However, all this analysis is still valid if $q$ is a parameterized
function $q_\alpha(z \mid x_n)$ of $x_n$. In that case, if we optimize
$\theta$ and $\alpha$ to maximize
However, all this analysis is still valid if $q$ is a parameterized
function $q_\alpha(z \mid x_n)$ of $x_n$. In that case, if we optimize
$\theta$ and $\alpha$ to maximize
@@
-145,5
+148,4
@@
$\theta$ and $\alpha$ to maximize
it maximizes $\log \, p_\theta(x_n)$ and brings $q_\alpha(z \mid
x_n)$ close to $p_\theta(z \mid x_n)$.
it maximizes $\log \, p_\theta(x_n)$ and brings $q_\alpha(z \mid
x_n)$ close to $p_\theta(z \mid x_n)$.
-
\end{document}
\end{document}