community
directory
books
authors
images
encyclopedia

Email:
Password:
Register

Knowledgerush Search

 

Google
  Web knowledgerush


Search for images of VC dimension


Message boards   Post comment

VC dimension

The VC dimension (for Vapnik Chervonenkis dimension) is a measure of the capacity of a classification algorithm. It is one of the core concepts in statistical learning theory. It was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

Intuitively, the capacity of a classification model is related to how complicated it can be. Think of thresholding a high-degree polynomial, where if the polynomial evaluates above zero, we classify that point into a positive class, negative otherwise. If we use a very high-degree polynomial, it can be very wiggly, and can fit a training set exactly. But, we should expect that outside of the training points, the classifier would not generalize well, because it is too wiggly. Such a polynomial has a high capacity. Alternatively, we can think about thresholding a linear polynomial. This polynomial may not fit the entire training set, because it has a low capacity. This notion of capacity can be made more rigorous.

First, we need the concept of shattering. Consider a classification model with some parameter vector . The model can shatter a set of data points () if, for all assignments of labels to those data points, there exists a such that the model makes no errors when evaluated that set of data points.

Now, we are ready to define a mathematical notion of capacity, called the VC dimension. The VC dimension of a model is the maximum such that all data point sets of cardinality can be shattered by .

The VC dimension has utility in statistical learning theory, because it can predict a probabilistic upper bound on the test error of a classification model.

The bound on the test error of a classification model (on data that is drawn i.i.d. from the same distribution as the training set) is given by

Training error+
with probability , where is the VC dimension of the classification model, and is the size of the training set.

References and Sources

  • Andrew Moore's VC dimension tutorial
  • V. Vapnik and A. Chervonenkis. "On the uniform convergence of relative frequencies of events to their probabilities." Theory of Probability and its Applications, 16(2):264--280, 1971.
  • A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. "Learnability and the Vapnik-Chervonenkis dimension." Journal of the ACM, 36(4):929--865, 1989.

Referenced By

Learning theory (statistics) | List of important publications in computer science | Statistical learning theory

 

Compose Your Message

Your Email Address or Pen Name (optional):
Subject:
Your Message:
 

 

 

 

 

 

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "VC dimension".

 

Contact UsPrivacy Statement & Terms of Use

 
Copyright © 1999-2003 Knowledgerush.com. All rights reserved.