Update.

[tex.git] / inftheory.tex
diff --git a/inftheory.tex b/inftheory.tex

index 7c34fce..954fd06 100644 (file)
--- a/inftheory.tex
+++ b/inftheory.tex
@@ -4,8 +4,8 @@
  %% https://creativecommons.org/publicdomain/zero/1.0/
  %% Written by Francois Fleuret <francois@fleuret.org>
  
-\documentclass[10pt,a4paper,twoside]{article}
-\usepackage[paperheight=18cm,paperwidth=10cm,top=5mm,bottom=20mm,right=5mm,left=5mm]{geometry}
+\documentclass[11pt,a4paper,oneside]{article}
+\usepackage[paperheight=15cm,paperwidth=8cm,top=2mm,bottom=15mm,right=2mm,left=2mm]{geometry}
  %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry}
  \usepackage[utf8]{inputenc}
  \usepackage{amsmath,amssymb,dsfont}
@@ -20,6 +20,8 @@
  \usetikzlibrary{tikzmark}
  \usetikzlibrary{decorations.pathmorphing}
  \usepackage[round]{natbib}
+\usepackage[osf]{libertine}
+\usepackage{microtype}
  
  \usepackage{mleftright}
  
@@ -29,7 +31,7 @@
  \setmuskip{\thickmuskip}{3.5mu} % by default it is equal to 5 mu
  
  \setlength{\parindent}{0cm}
-\setlength{\parskip}{12pt}
+\setlength{\parskip}{1ex}
  %\renewcommand{\baselinestretch}{1.3}
  %\setlength{\tabcolsep}{0pt}
  %\renewcommand{\arraystretch}{1.0}
@@ -65,12 +67,17 @@
  
  \begin{document}
  
-\vspace*{1ex}
+\vspace*{-3ex}
  
  \begin{center}
  {\Large Some bits of Information Theory}
  
-\today
+Fran\c cois Fleuret
+
+January 19, 2024
+
+\vspace*{1ex}
+
  \end{center}
  
  Information Theory is awesome so here is a TL;DR about Shannon's entropy.
@@ -79,9 +86,9 @@ The field is originally about quantifying the amount of
  ``information'' contained in a signal and how much can be transmitted
  under certain conditions.
  
-What makes it awesome IMO is that it is very intuitive, and like
-thermodynamics in Physics, it gives exact bounds about what is possible
-or not.
+What makes it awesome is that it is very intuitive, and like
+thermodynamics in Physics, it gives exact bounds about what is
+possible or not.
  
  \section{Shannon's Entropy}
  
@@ -96,8 +103,8 @@ line, you can design a coding that takes into account that the symbols
  are not all as probable, and decode on the other side.
  
  For instance if $\proba('\!\!A')=1/2$, $\proba('\!\!B')=1/4$, and
-$\proba('\!\!C')=1/4$ you would transmit ``0'' for a ``A'' and ``10'' for a
-``B'' and ``11'' for a ``C'', 1.5 bits on average.
+$\proba('\!\!C')=1/4$ you would transmit ``$0$'' for a ``A'' and ``$10$'' for a
+``B'' and ``$11$'' for a ``C'', 1.5 bits on average.
  
  If the symbol is always the same, you transmit nothing, if they are
  equiprobable you need $\log_2$(nb symbols) etc.
@@ -153,11 +160,16 @@ that quantifies the amount of information shared by the two variables.
  
  Conditional entropy is the average of the entropy of the conditional distribution:
  %
-\begin{align*}
- & \entropy(X \mid Y)                        \\
- & = \sum_y \proba(Y=y) \entropy(X \mid Y=y) \\
- & = \sum_y \proba(Y=y) \sum_x \proba(X=x \mid Y=y) \log \proba(X=x \mid Y=y)
-\end{align*}
+\begin{equation*}
+\entropy(X \mid Y) = \sum_y \proba(Y=y) \entropy(X \mid Y=y)
+\end{equation*}
+%
+with
+%
+\begin{eqnarray*}
+\entropy(X \mid Y=y) \hspace*{13.5em} \\
+ = \sum_x \proba(X=x \mid Y=y) \log \proba(X=x \mid Y=y)
+\end{eqnarray*}
  
  Intuitively it is the [minimum average] number of bits required to describe $X$ given that $Y$ is known.
  
@@ -175,13 +187,13 @@ and if $X$ is a deterministic function of $Y$ then
    \entropy(X \mid Y)=0.
  \]
  
-And if you send the bits for $Y$ and then the bits to describe $X$ given
-that $Y$, you have sent $(X, Y)$. Hence we have the chain rule:
+And if you send the bits for $Y$ and then the bits to describe $X$
+given that $Y$, you have sent $(X, Y)$, hence the chain rule:
  %
  \[
  \entropy(X, Y) = \entropy(Y) + \entropy(X \mid Y).
  \]
-
+%
  And then we get
  %
  \begin{align*}