I have the following code: ```
rm(list=ls()) N = 30000 xvar <- runif(N, -10, 10) e <- rnorm(N, mean=0, sd=1) yvar <- 1 + 2*xvar + e plot(xvar,yvar) lmMod <- lm(yvar~xvar) print(summary(lmMod)) domain <- seq(min(xvar), max(xvar)) # define a vector of x values to feed into model lines(domain, predict(lmMod, newdata = data.frame(xvar=domain))) # add regression line, using `predict` to generate y-values ``` I expected the coefficients to be something similar to [1,2]. Instead R keeps throwing at me random numbers that are not statistically significant and don't fit the model, and I have 20k observations. For example ``` Call: lm(formula = yvar ~ xvar) Residuals: Min 1Q Median 3Q Max -21.384 -8.908 1.016 10.972 23.663 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0007145 0.0670316 0.011 0.991 xvar 0.0168271 0.0116420 1.445 0.148 Residual standard error: 11.61 on 29998 degrees of freedom Multiple R-squared: 7.038e-05, Adjusted R-squared: 3.705e-05 F-statistic: 2.112 on 1 and 29998 DF, p-value: 0.1462 ``` The strange thing is that the code works perfectly for N=200 or N=2000. It's only for larger N that this thing happen U(for example, N=20000). I have tried to ask for example in CrossValidated <https://stats.stackexchange.com/questions/410050/increasing-number-of-observations-worsen-the-regression-model> but the code works for them. Any help? I am runnign R 3.6.0 on Kubuntu 19.04 Best regards Raffaele [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.