Dear experts, I tried to learn about Box-Cox-transformation but found the following thing:
When I had to add a constant to make all values of the original variable positive, I found that the lambda estimates (box.cox.powers-function) differed dramatically depending on the specific constant chosen. In addition, the correlation between the transformed variable and the original were not 1 (as I think it should be to use the transformed variable meaningfully) but much lower. With higher added values (and a right skewed variable) the lambda estimate was even negative and the correlation between the transformed variable and the original varible was -.91!!? I guess that is something fundmental missing in my current thinking about box-cox... Best, Holger P.S. Here is what i did: # Creating of a skewed variable X (mixture of two normals) x1 = rnorm(120,0,.5) x2 = rnorm(40,2.5,2) X = c(x1,x2) # Adding a small constant Xnew1 = X +abs(min(X))+ .1 box.cox.powers(Xnew1) Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate) # Adding a larger constant Xnew2 = X +abs(min(X)) + 1 box.cox.powers(Xnew2) Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate) #Plotting it all par(mfrow=c(3,2)) hist(X) qqnorm(X) qqline(X,lty=2) hist(Xtrans1) qqnorm(Xtrans1) qqline(Xtrans1,lty=2) hist(Xtrans2) qqnorm(Xtrans2) qqline(Xtrans2,lty=2) #correlation among original and transformed variables round(cor(cbind(X,Xtrans1,Xtrans2)),2) -- View this message in context: http://r.789695.n4.nabble.com/Box-Cox-Transformation-Drastic-differences-when-varying-added-constants-tp2218490p2218490.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.