Hi Alberto, In your second case the linear model y = a*x + b + error does not hold.
--- On Tue, 19/8/08, Alberto Monteiro <[EMAIL PROTECTED]> wrote: > From: Alberto Monteiro <[EMAIL PROTECTED]> > Subject: [R] A doubt about "lm" and the meaning of its summary > To: r-help@r-project.org > Received: Tuesday, 19 August, 2008, 4:31 AM > I have a conceptual problem (sort of). Maybe there's a > simple solution, > maybe not. > > First, let me explain the test case that goes ok. > > Let x be a (fixed) n-dimensional vector. I simulate a lot > of linear models, > y = m x + c + error, then I do a lot of regressions. As > expected, the > estimated values of m (let's call them m.bar) are > distributed according to a > Student's t distribution. > > More precisely (test case, it takes a few minutes to run): > # > # start with fixed numbers > # > m <- sample(c(-1, -0.1, 0.1, 1), 1) > c <- sample(c(-1, -0.1, 0, 0.1, 1), 1) > sigma <- sample(c(0.1, 0.2, 0.5, 1), 1) > n <- sample(c(4,5,10,1000), 1) > x <- 1:n # it's possible to use other x > NN <- sample(c(3000, 4000, 5000), 1) > m.bar <- m.std.error <- 0 # these vectors will > hold the simulation output > > # > # Now let's repeat NN times: > # simulate y > # regress y ~ x > # store m.bar and its error > # > for (i in 1:NN) { > y <- m * x + c + sigma * rnorm(n) > reg <- lm(y ~ x) > m.bar[i] <- summary(reg)$coefficients['x', > 'Estimate'] > m.std.error[i] <- > summary(reg)$coefficients['x', 'Std. Error'] > } > # > # Finally, let's analyse it > # The distribution of (m.bar - m) / m.std.error is > # a Student's t with n - 2 degrees of freedom. > # Is it? Let's test! > # > print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2))) > > Now, this goes ok, with ks.test returning a number > uniformly distributed in > the 0-1 interval. > > Next, I did a slightly different model. This model is a > reversion model, > where the simulated random variable lp goes along the > equation: > lp[t + 1] <- (1 + m) * lp[t] + c + error > > I tried to use the same method as above, defining > x = lp[1:n], y = lp[2:(n+1}] - lp[1:n], with equation > y <- m * x + c + error > > And now it breaks. Test case (it takes some minutes to > run): > > # > # start with fixed numbers > # > m <- runif(1, -1, 0) # m must be something in the > (-1,0) interval > c <- sample(c(-1, -0.1, 0, 0.1, 1), 1) > sigma <- sample(c(0.1, 0.2, 0.5, 1), 1) > n <- sample(c(4,5,10,1000), 1) > NN <- sample(c(3000, 4000, 5000), 1) > m.bar <- m.std.error <- 0 # these vectors will > hold the simulation output > > # > # Now let's repeat NN times: > # simulate y > # regress y ~ x > # store m.bar and its error > # > for (i in 1:NN) { > lp <- 0 > lp[1] <- 0 > for (j in 1:n) { > lp[j+1] = (1 + m) * lp[j] + c + sigma * rnorm(1) > } > x <- lp[1:n] > y <- lp[-1] - x > reg <- lm(y ~ x) > m.bar[i] <- summary(reg)$coefficients['x', > 'Estimate'] > m.std.error[i] <- > summary(reg)$coefficients['x', 'Std. Error'] > } > # > # Finally, let's analyse it > # The distribution of (m.bar - m) / m.std.error should be > # a Student's t with n - 2 degrees of freedom. > # Is it? Let's test! > # > print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2))) > > ... and now it's not. > > What's wrong with this? I suspect that the model y ~ x > does only give > meaningful values when x is deterministic; in this case x > is stochastic. Is > there any correct way to estimate this model giving > meaningful outputs? > > Alberto Monteiro > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.