I have a conceptual problem (sort of). Maybe there's a simple solution, maybe not.
First, let me explain the test case that goes ok. Let x be a (fixed) n-dimensional vector. I simulate a lot of linear models, y = m x + c + error, then I do a lot of regressions. As expected, the estimated values of m (let's call them m.bar) are distributed according to a Student's t distribution. More precisely (test case, it takes a few minutes to run): # # start with fixed numbers # m <- sample(c(-1, -0.1, 0.1, 1), 1) c <- sample(c(-1, -0.1, 0, 0.1, 1), 1) sigma <- sample(c(0.1, 0.2, 0.5, 1), 1) n <- sample(c(4,5,10,1000), 1) x <- 1:n # it's possible to use other x NN <- sample(c(3000, 4000, 5000), 1) m.bar <- m.std.error <- 0 # these vectors will hold the simulation output # # Now let's repeat NN times: # simulate y # regress y ~ x # store m.bar and its error # for (i in 1:NN) { y <- m * x + c + sigma * rnorm(n) reg <- lm(y ~ x) m.bar[i] <- summary(reg)$coefficients['x', 'Estimate'] m.std.error[i] <- summary(reg)$coefficients['x', 'Std. Error'] } # # Finally, let's analyse it # The distribution of (m.bar - m) / m.std.error is # a Student's t with n - 2 degrees of freedom. # Is it? Let's test! # print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2))) Now, this goes ok, with ks.test returning a number uniformly distributed in the 0-1 interval. Next, I did a slightly different model. This model is a reversion model, where the simulated random variable lp goes along the equation: lp[t + 1] <- (1 + m) * lp[t] + c + error I tried to use the same method as above, defining x = lp[1:n], y = lp[2:(n+1}] - lp[1:n], with equation y <- m * x + c + error And now it breaks. Test case (it takes some minutes to run): # # start with fixed numbers # m <- runif(1, -1, 0) # m must be something in the (-1,0) interval c <- sample(c(-1, -0.1, 0, 0.1, 1), 1) sigma <- sample(c(0.1, 0.2, 0.5, 1), 1) n <- sample(c(4,5,10,1000), 1) NN <- sample(c(3000, 4000, 5000), 1) m.bar <- m.std.error <- 0 # these vectors will hold the simulation output # # Now let's repeat NN times: # simulate y # regress y ~ x # store m.bar and its error # for (i in 1:NN) { lp <- 0 lp[1] <- 0 for (j in 1:n) { lp[j+1] = (1 + m) * lp[j] + c + sigma * rnorm(1) } x <- lp[1:n] y <- lp[-1] - x reg <- lm(y ~ x) m.bar[i] <- summary(reg)$coefficients['x', 'Estimate'] m.std.error[i] <- summary(reg)$coefficients['x', 'Std. Error'] } # # Finally, let's analyse it # The distribution of (m.bar - m) / m.std.error should be # a Student's t with n - 2 degrees of freedom. # Is it? Let's test! # print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2))) ... and now it's not. What's wrong with this? I suspect that the model y ~ x does only give meaningful values when x is deterministic; in this case x is stochastic. Is there any correct way to estimate this model giving meaningful outputs? Alberto Monteiro ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.