H(X \mid Y)=0
\]
-And since if you send the bits for Y and then the bits to describe X given that X is known you have sent (X, Y), we have the chain rule:
+And since if you send the bits for Y and then the bits to describe X given that Y, you have sent (X, Y), we have the chain rule:
%
\[
H(X, Y) = H(Y) + H(X \mid Y).