Hello r-list members, I've been doing some linear modeling with a dataset structured as follows. Tubes containing 500 larvae of Trichinella each were treated with one of four different temperatures. Each day (or every 10 days depending on treatment group), 3 tubes were selected from each treatment and all the dead larvae were counted. The tubes were discarded. Final larvae counts were averaged.
Then we have: y = dead larvae (count) X1 = Day: dead larvae contained on three tubes were quantified either daily (tubes at -30) or every 10 days (tubes at -20 ºC, 4º C, and lab temp.). This was done for each treatment until all the larvae in all three tubes were dead. X2: Temperature treatment (4 factors): -30º C, -20 ºC, 4º C, and lab temperature. Because we counted larvae for each treatment until all the 500 "larvae of the day (batch of three tubes) were dead, the experiment was terminated at different times for each treatment (e.g. day 95 for -30, day 200 for -20, and so on). This led to a final dataset containing data collected over different time ranges. Days Dead_larvae Group 1 100 30 below 2 145 30 below 3 277 30 below 4 284 30 below 5 288 30 below 6 294 30 below 7 359 30 below . . . . . . . . . 95 500 30 below 10 25 20 below 20 35 20 below 30 105 20 below 40 230 20 below . . . . . . . . . 200 500 20 below . . . . . . . . . Model specification: > my_model <- lm(Larvae_count ~ Days + I(Days^2) + Group, data = Data) Call: lm(formula = Larvas_muertas ~ Dias + I(Dias^2) + Grupo, data = Data) Residuals: Min 1Q Median 3Q Max -356.983 -31.229 3.606 37.768 170.846 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.556e+02 6.634e+00 53.60 <2e-16 *** Dias 2.403e+00 9.963e-02 24.11 <2e-16 *** I(Dias^2) -2.732e-03 1.885e-04 -14.49 <2e-16 *** Grupo-20 -1.422e+02 1.283e+01 -11.08 <2e-16 *** Grupo4 -3.117e+02 1.188e+01 -26.23 <2e-16 *** GrupoAmb -3.830e+02 1.212e+01 -31.59 <2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 67.04 on 348 degrees of freedom (8 observations deleted due to missingness) Multiple R-squared: 0.7879, Adjusted R-squared: 0.7848 F-statistic: 258.5 on 5 and 348 DF, p-value: < 2.2e-16 Q1. Is the modeling approach / specification correct? Q2. Is the fact that larvae were counted over different periods of time, thus leading to markedly different ranges of X1 for each treatment, too bad a thing? Might this lead to seriously biased estimates? Q3. Am I incurring in violation of residual independence due to correlation between residuals from different time points? If so, how can one deal with it in R? I know my question is both a statistical and R-related one, so apologies in advance. Best luck, Luciano ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.