Curse of Dimensionality
The term "Curse of Dimensionality" was coined by American applied mathematician Richard Bellman. Simply stated, it refers to the problems which arise when we generalize our three dimensional intuitions to higher dimensions or vice-versa.
We as humans, are so accustomed living in three dimensional world that it is hard for us to imagine higher dimensional spaces. Our intuitions are hard-wired to the 3-D world.
Here is one interesting example (citing Christopher Bishop). Let us try to imagine a sphere in three dimensional world. It will look like the following image.
Let us assume that the sphere has radius, r = 1. Now let us ask a simple question from ourselves that in K-dimensional world, what fraction of sphere lies between r = 1 and r = 1-e, where e is a very small real number.
It basically means that in 3-d world the volume will be
Now coming back to the arbitrary K-dimensional world. The required difference ratio in volume will be
If we dig deep, in 1-d world. The difference above will be 0(zero) because in 1-d world a sphere will be a point. In 2-d world, the difference in the volume ratio would be around 0.2. Eventually when we apply binomial expansion we will realize that as we increase the value of K, i.e we increase the dimensions the difference in the volume ratio will go towards 1. It means in higher dimensions most of the volume of the sphere is concentrated near the surface! This completely beats the intuition we have about spheres in 3-d world.
The example above is to explain the curse of dimensionality. Now let us relate this problem to machine learning world.
Let us take an example of images. Images are higher dimensional in nature by default. An RGB image will be of (m x n) x 3 dimensions, where m,n denote the width and height of the image while 3 denote number of channels. There are numerous simple ways to reduce dimension of an image. Let us say we go from 3 channels to 1 channel, i.e from RGB to grayscale. Here we lose the information about color. If we reduce m and n then we will lose the quality of the image.
There are a few ways to reduce dimensionality of the data like PCA, t-SNE, MDS and LLE which we will explain in upcoming posts. There is no way to measure which is the best way to go for dimensionality reduction because if often depends on which algorithm is being chosen after dimensionality reduction. We will discuss these topics later.
If you liked the blog then please subscribe and share the blog :)