[R] Box-Cox Transformation: Drastic differences when varying added constants

Holger Steinmetz Sun, 16 May 2010 07:52:16 -0700

Dear experts,

I tried to learn about Box-Cox-transformation but found the following thing:


When I had to add a constant to make all values of the original variable
positive, I found that 
the lambda estimates (box.cox.powers-function) differed dramatically
depending on the specific constant chosen.

In addition, the correlation between the transformed variable and the
original were not 1 (as I think it should be to use the transformed variable
meaningfully) but much lower.

With higher added values (and a right skewed variable) the lambda estimate
was even negative and the correlation between the transformed variable and
the original varible was -.91!!?

I guess that is something fundmental missing in my current thinking about
box-cox...

Best,
Holger


P.S. Here is what i did:

# Creating of a skewed variable X (mixture of two normals)
x1 = rnorm(120,0,.5)
x2 = rnorm(40,2.5,2)
X = c(x1,x2)

# Adding a small constant
Xnew1 = X +abs(min(X))+ .1
box.cox.powers(Xnew1)
Xtrans1 = Xnew1^.2682 #(the value of the lambda estimate)

# Adding a larger constant
Xnew2 = X +abs(min(X)) + 1
box.cox.powers(Xnew2)
Xtrans2 = Xnew2^-.2543 #(the value of the lambda estimate)

#Plotting it all
par(mfrow=c(3,2))
hist(X)
qqnorm(X)
qqline(X,lty=2) 
hist(Xtrans1)
qqnorm(Xtrans1)        
qqline(Xtrans1,lty=2) 
hist(Xtrans2)
qqnorm(Xtrans2)        
qqline(Xtrans2,lty=2) 

#correlation among original and transformed variables
round(cor(cbind(X,Xtrans1,Xtrans2)),2)
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Box-Cox-Transformation-Drastic-differences-when-varying-added-constants-tp2218490p2218490.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Box-Cox Transformation: Drastic differences when varying added constants

Reply via email to