Why do we need to drop one dummy variable?
For an example if in our data set there is a variable like location like:
Location ----------CaliforniaNew YorkFlorida
We have to convert them like
1 0 00 1 00 0 1
But it was suggested we have to discard one dummy variable, no matter how many dummy variables are there .
Reason:
Because the third dummy can be explained as the linear combination of the first two:
FL = 1 - (CA + NY)
if you have n variable then:
nth = 1 - (sum of all other)