Mutual information
The mutual information between two variables X and Y is given by
where P(X) and P(Y) are the probability distributions of X and Y.
Properties of mutual information
If X and Y are independent,
then I(X,Y) = 0,
since P(X,Y) = P(X) P(Y) in that case.
Mutual information is symmetric: I(X,Y) = I(Y,X).
Mutual information is nonnegative: I(X,Y) ≥ 0.
Relation to other quantities
The mutual information can be equivalent expressed as
where H(X) and H(X|Y) are the unconditional and conditional entropy of X,
likewise H(Y) and H(Y|X) are the unconditional and conditional entropy of Y,
with
-
and
-
Since H(X) > H(X|Y),
this proves the nonnegativity property stated above.
Mutual information can also be expressed in terms of the Kullback-Leibler divergence.
Note that
-
-
Thus mutual information can be understood as a weighted Kullback-Leibler divergence:
the more different the distributions P(X) and P(X|Y),
the greater the information gain.
References
Athanasios Papoulis. Probability, Random Variables, and Stochastic Processes, second edition. New York: McGraw-Hill, 1984. (See Chapter 15.)
|