Update.

author François Fleuret <francois@fleuret.org>

Fri, 19 Jan 2024 07:58:43 +0000 (08:58 +0100)

committer François Fleuret <francois@fleuret.org>

Fri, 19 Jan 2024 07:58:43 +0000 (08:58 +0100)
author François Fleuret <francois@fleuret.org>
Fri, 19 Jan 2024 07:58:43 +0000 (08:58 +0100)
committer François Fleuret <francois@fleuret.org>
Fri, 19 Jan 2024 07:58:43 +0000 (08:58 +0100)
diff --git a/inftheory.tex b/inftheory.tex

index 29b0557..e64fe5b 100644 (file)
--- a/inftheory.tex
+++ b/inftheory.tex
@@ -1,5 +1,9 @@
  %% -*- mode: latex; mode: reftex; mode: flyspell; coding: utf-8; tex-command: "pdflatex.sh" -*-
  
+%% Any copyright is dedicated to the Public Domain.
+%% https://creativecommons.org/publicdomain/zero/1.0/
+%% Written by Francois Fleuret <francois@fleuret.org>
+
  \documentclass[10pt,a4paper,twoside]{article}
  \usepackage[paperheight=18cm,paperwidth=10cm,top=5mm,bottom=20mm,right=5mm,left=5mm]{geometry}
  %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry}
@@ -16,8 +20,6 @@
  \usetikzlibrary{tikzmark}
  \usetikzlibrary{decorations.pathmorphing}
  \usepackage[round]{natbib}
-%\usepackage{cmbright}
-%\usepackage{showframe}
  
  \usepackage{mleftright}
  
@@ -36,34 +38,26 @@
  \def\argmin{\operatornamewithlimits{argmin}}
  \def\expect{\mathds{E}}
  
-%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-%% The \todo command
-\newcounter{nbdrafts}
-\setcounter{nbdrafts}{0}
-\makeatletter
-\newcommand{\checknbdrafts}{
-\ifnum \thenbdrafts > 0
-\@latex@warning@no@line{*WARNING* The document contains \thenbdrafts \space draft note(s)}
-\fi}
-\newcommand{\todo}[1]{\addtocounter{nbdrafts}{1}{\color{red} #1}}
-\makeatother
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  
  \begin{document}
  
  Information Theory is awesome so here is a TL;DR about Shannon's entropy.
  
-This field is about quantifying the amount of ``information'' contained
-in a signal and how much can be transmitted under certain conditions.
+The field is originally about quantifying the amount of
+``information'' contained in a signal and how much can be transmitted
+under certain conditions.
  
-What makes it awesome IMO is that it is very intuitive, and like thermodynamics in Physics it give exact bounds about what is possible or not.
+What makes it awesome IMO is that it is very intuitive, and like
+thermodynamics in Physics it give exact bounds about what is possible
+or not.
  
  \section{Shannon's Entropy}
  
-This is the key concept from which everything is defined.
+Shannon's entropy is the key concept from which everything is defined.
  
-Imagine that you have a distribution of probabilities p on a finite
-set of symbols and that you generate a stream of symbols by sampling
+Imagine that you have a distribution of probabilities $p$ on a finite
+set of symbols, and that you generate a stream of symbols by sampling
  them one after another independently with that distribution.
  
  To transmit that stream, for instance with bits over a communication
@@ -78,9 +72,9 @@ If the symbol is always the same, you transmit nothing, if they are
  equiprobable you need $\log_2$(nb symbols) etc.
  
  Shannon's Entropy (in base 2) is the minimum number of bits you have
-to emit on average to transmit that stream.
+to emit on average per symbol to transmit that stream.
  
-It has a simple formula:
+It has a simple analytical form:
  %
  \[
   H(p) = - \sum_k p(k) \log_2 p(k)
@@ -91,20 +85,23 @@ where by convention $o \log_2 0 = 0$.
  It is often seen as a measure of randomness since the more
  deterministic the distribution is, the less you have to emit.
  
-The codings above are "Huffman coding", which reaches the Entropy
-bound only for some distributions. The "Arithmetic coding" does it
-always.
+The examples above correspond to "Huffman coding", which reaches the
+Entropy bound only for some distributions. A more sophisticated scheme
+called "Arithmetic coding" does it always.
  
  From this perspective, many quantities have an intuitive
  value. Consider for instance sending pairs of symbols (X, Y).
  
-If these two symbols are independent, you cannot do better than sending one and the other separately, hence
+If these two symbols are independent, you cannot do better than
+sending one and the other separately, hence
  %
  \[
  H(X, H) = H(X) + H(Y).
  \]
  
-However, imagine that the second symbol is a function of the first Y=f(X). You just have to send X since Y can be computed from it on the other side.
+However, imagine that the second symbol is a function of the first
+Y=f(X). You just have to send X since Y can be computed from it on the
+other side.
  
  Hence in that case
  %
@@ -198,5 +195,8 @@ minus the entropy of p
  H(p) = -\sum_x p(x) \log p(x).
  \]
  
-Notation horror: if $X$ and $Y$ are random variables $H(X, Y)$ is the entropy of their joint law, and if $p$ and $q$ are distributions, $H(p,q)$ is the cross-entropy between them.
+Notation horror: if $X$ and $Y$ are random variables $H(X, Y)$ is the
+entropy of their joint law, and if $p$ and $q$ are distributions,
+$H(p,q)$ is the cross-entropy between them.
+
  \end{document}
author	François Fleuret <francois@fleuret.org>
	Fri, 19 Jan 2024 07:58:43 +0000 (08:58 +0100)
committer	François Fleuret <francois@fleuret.org>
	Fri, 19 Jan 2024 07:58:43 +0000 (08:58 +0100)