Mobius Strip
New member
- Jun 18, 2026
- 0
- 0
say you are doing a multiple regression with a dummy variable, classic textbook example is testing for wage differences between men and women while controlling for age:
Y=b0+b1*D+b2*X+e, Y is wage, D=0 if man/ 1 if woman, and X is age.
You have a sample size N, and when you run the regression you get a relationship with strong explanatory power (high adjusted R^2) and statistically significant (low p-values for F, t statistics)
the coefficient in front of the dummy variable b1 is supposed to tell you about the wage differential and is statistically significant, but say you realize that your sample size contains almost all men (an extreme example would be N-1 men, 1 woman). You can still get a b1 coefficient with low p-value, and the overall regression will be with high explanatory power. But does this b1 number mean anything in this case?
Basically I have a sample which is split into 2 categories with a dummy variable, and one of the two subgroups ends up being very very small compared to the other. Are the regression results to be trusted, specifically the dummy variable coefficient
Y=b0+b1*D+b2*X+e, Y is wage, D=0 if man/ 1 if woman, and X is age.
You have a sample size N, and when you run the regression you get a relationship with strong explanatory power (high adjusted R^2) and statistically significant (low p-values for F, t statistics)
the coefficient in front of the dummy variable b1 is supposed to tell you about the wage differential and is statistically significant, but say you realize that your sample size contains almost all men (an extreme example would be N-1 men, 1 woman). You can still get a b1 coefficient with low p-value, and the overall regression will be with high explanatory power. But does this b1 number mean anything in this case?
Basically I have a sample which is split into 2 categories with a dummy variable, and one of the two subgroups ends up being very very small compared to the other. Are the regression results to be trusted, specifically the dummy variable coefficient