X-Git-Url: https://www.fleuret.org/cgi-bin/gitweb/gitweb.cgi?p=tex.git;a=blobdiff_plain;f=inftheory.tex;fp=inftheory.tex;h=954fd0630c48e3636e6923a34afeaf44344f87b4;hp=7c34fce40c8b14b95d07e29c86c16aa3ffd21d73;hb=74cdd5e14b65ac1ff03725173eb941dc7a455edf;hpb=cf0fd332cb70bf1a2a793ce658770da5ca702db9 diff --git a/inftheory.tex b/inftheory.tex index 7c34fce..954fd06 100644 --- a/inftheory.tex +++ b/inftheory.tex @@ -4,8 +4,8 @@ %% https://creativecommons.org/publicdomain/zero/1.0/ %% Written by Francois Fleuret -\documentclass[10pt,a4paper,twoside]{article} -\usepackage[paperheight=18cm,paperwidth=10cm,top=5mm,bottom=20mm,right=5mm,left=5mm]{geometry} +\documentclass[11pt,a4paper,oneside]{article} +\usepackage[paperheight=15cm,paperwidth=8cm,top=2mm,bottom=15mm,right=2mm,left=2mm]{geometry} %\usepackage[a4paper,top=2.5cm,bottom=2cm,left=2.5cm,right=2.5cm]{geometry} \usepackage[utf8]{inputenc} \usepackage{amsmath,amssymb,dsfont} @@ -20,6 +20,8 @@ \usetikzlibrary{tikzmark} \usetikzlibrary{decorations.pathmorphing} \usepackage[round]{natbib} +\usepackage[osf]{libertine} +\usepackage{microtype} \usepackage{mleftright} @@ -29,7 +31,7 @@ \setmuskip{\thickmuskip}{3.5mu} % by default it is equal to 5 mu \setlength{\parindent}{0cm} -\setlength{\parskip}{12pt} +\setlength{\parskip}{1ex} %\renewcommand{\baselinestretch}{1.3} %\setlength{\tabcolsep}{0pt} %\renewcommand{\arraystretch}{1.0} @@ -65,12 +67,17 @@ \begin{document} -\vspace*{1ex} +\vspace*{-3ex} \begin{center} {\Large Some bits of Information Theory} -\today +Fran\c cois Fleuret + +January 19, 2024 + +\vspace*{1ex} + \end{center} Information Theory is awesome so here is a TL;DR about Shannon's entropy. @@ -79,9 +86,9 @@ The field is originally about quantifying the amount of ``information'' contained in a signal and how much can be transmitted under certain conditions. -What makes it awesome IMO is that it is very intuitive, and like -thermodynamics in Physics, it gives exact bounds about what is possible -or not. +What makes it awesome is that it is very intuitive, and like +thermodynamics in Physics, it gives exact bounds about what is +possible or not. \section{Shannon's Entropy} @@ -96,8 +103,8 @@ line, you can design a coding that takes into account that the symbols are not all as probable, and decode on the other side. For instance if $\proba('\!\!A')=1/2$, $\proba('\!\!B')=1/4$, and -$\proba('\!\!C')=1/4$ you would transmit ``0'' for a ``A'' and ``10'' for a -``B'' and ``11'' for a ``C'', 1.5 bits on average. +$\proba('\!\!C')=1/4$ you would transmit ``$0$'' for a ``A'' and ``$10$'' for a +``B'' and ``$11$'' for a ``C'', 1.5 bits on average. If the symbol is always the same, you transmit nothing, if they are equiprobable you need $\log_2$(nb symbols) etc. @@ -153,11 +160,16 @@ that quantifies the amount of information shared by the two variables. Conditional entropy is the average of the entropy of the conditional distribution: % -\begin{align*} - & \entropy(X \mid Y) \\ - & = \sum_y \proba(Y=y) \entropy(X \mid Y=y) \\ - & = \sum_y \proba(Y=y) \sum_x \proba(X=x \mid Y=y) \log \proba(X=x \mid Y=y) -\end{align*} +\begin{equation*} +\entropy(X \mid Y) = \sum_y \proba(Y=y) \entropy(X \mid Y=y) +\end{equation*} +% +with +% +\begin{eqnarray*} +\entropy(X \mid Y=y) \hspace*{13.5em} \\ + = \sum_x \proba(X=x \mid Y=y) \log \proba(X=x \mid Y=y) +\end{eqnarray*} Intuitively it is the [minimum average] number of bits required to describe $X$ given that $Y$ is known. @@ -175,13 +187,13 @@ and if $X$ is a deterministic function of $Y$ then \entropy(X \mid Y)=0. \] -And if you send the bits for $Y$ and then the bits to describe $X$ given -that $Y$, you have sent $(X, Y)$. Hence we have the chain rule: +And if you send the bits for $Y$ and then the bits to describe $X$ +given that $Y$, you have sent $(X, Y)$, hence the chain rule: % \[ \entropy(X, Y) = \entropy(Y) + \entropy(X \mid Y). \] - +% And then we get % \begin{align*}