[R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).  However I only have four observations 
per factor level. Is it still possible to apply a regression with such a small 
sample size. I think that i should be difficult to correctly estimate 
variance.Do you think that I rather should compute a non-parametric test such 
as Kruskal-Wallis? However I need to include covariables in my models and I am 
not sure if basic non-parametric tests are suitable for this. Thanks for any 
suggestion.
___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Roman Luštrik
I think you can, but the confidence intervals will be rather large due to
number of samples.

Notice how standard errors change for sample size (per group) from 4 to 30.

> pg <- 4 # pg = per group
> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1),
rnorm(pg, mean = 11), rnorm(pg, mean = 30)),
+ trt = rep(c("trt1", "trt2", "trt3", "trt4"), each =
pg),
+ cov = runif(pg*4)) # 4 groups
> summary(lm(var ~ trt + cov, data = my.df))

Call:
lm(formula = var ~ trt + cov, data = my.df)

Residuals:
 Min   1Q   Median   3Q  Max
-1.63861 -0.46080  0.03332  0.66380  1.27974

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.2345 1.0218   1.2080.252
trttrt2  -0.7759 0.8667  -0.8950.390
trttrt3   7.8503 0.8308   9.449  1.3e-06 ***
trttrt4  28.2685 0.9050  31.236  4.3e-12 ***
cov   1.4027 1.1639   1.2050.253
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.154 on 11 degrees of freedom
Multiple R-squared:  0.9932, Adjusted R-squared:  0.9908
F-statistic: 404.4 on 4 and 11 DF,  p-value: 7.467e-12

>
> pg <- 30 # pg = per group
> my.df <- data.frame(var = c(rnorm(pg, mean = 3), rnorm(pg, mean = 1),
rnorm(pg, mean = 11), rnorm(pg, mean = 30)),
+ trt = rep(c("trt1", "trt2", "trt3", "trt4"), each =
pg),
+ cov = runif(pg*4)) # 4 groups
> summary(lm(var ~ trt + cov, data = my.df))

Call:
lm(formula = var ~ trt + cov, data = my.df)

Residuals:
Min  1Q  Median  3Q Max
-2.5778 -0.6584 -0.0185  0.6423  3.2077

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  2.769610.25232  10.977  < 2e-16 ***
trttrt2 -1.754900.28546  -6.148 1.17e-08 ***
trttrt3  8.405210.28251  29.752  < 2e-16 ***
trttrt4 27.040950.28286  95.599  < 2e-16 ***
cov  0.051290.32523   0.1580.875
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.094 on 115 degrees of freedom
Multiple R-squared:  0.9913, Adjusted R-squared:  0.991
F-statistic:  3269 on 4 and 115 DF,  p-value: < 2.2e-16

On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:

> Hi, I would like to test the impact of a treatment of some variable using
> regression (e.g. lm(var ~ trt + cov)).  However I only have four
> observations per factor level. Is it still possible to apply a regression
> with such a small sample size. I think that i should be difficult to
> correctly estimate variance.Do you think that I rather should compute a
> non-parametric test such as Kruskal-Wallis? However I need to include
> covariables in my models and I am not sure if basic non-parametric tests
> are suitable for this. Thanks for any suggestion.
> ___
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> [[alternative HTML version deleted]]
>
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
>



-- 
In God we trust, all others bring data.

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Thank you very much. If I get it right, the CI get wider, my test has less 
power and the probability of getting a significant relation decreases. What 
about the significant coefficients, are they reliable?




> Message du 20/10/14 à 11h30
> De : "Roman Luštrik" 
> A : "V. Coudrain" 
> Copie à : "r-sig-ecology@r-project.org" 
> Objet : Re: [R-sig-eco] Regression with few observations per factor level
> 
> I think you can, but the confidence intervals will be rather large due to 
> number of samples.
> Notice how standard errors change for sample size (per group) from 4 to 30.
> > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 3), 
> > rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +         
> >             trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +      
> >                cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, 
> > data = my.df))
> Call:lm(formula = var ~ trt + cov, data = my.df)
> Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  
> 0.03332  0.66380  1.27974 
> Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  
>  1.2345     1.0218   1.208    0.252    trttrt2      -0.7759     0.8667  
> -0.895    0.390    trttrt3       7.8503     0.8308   9.449  1.3e-06 
> ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov           1.4027   
>   1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
> 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  
> 0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  
> p-value: 7.467e-12
> > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> >3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +      
> >               trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg), +    
> >                 cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt + cov, 
> >data = my.df))
> Call:lm(formula = var ~ trt + cov, data = my.df)
> Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185  
> 0.6423  3.2077 
> Coefficients:            Estimate Std. Error t value Pr(>|t|)    (Intercept)  
> 2.76961    0.25232  10.977  < 2e-16 ***trttrt2     -1.75490    0.28546  
> -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752  < 2e-16 
> ***trttrt4     27.04095    0.28286  95.599  < 2e-16 ***cov          0.05129   
>  0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 
> 0.05 ‘.’ 0.1 ‘ ’ 1
> Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared:  
> 0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  
> p-value: < 2.2e-16
> On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> Hi, I would like to test the impact of a treatment of some variable using 
> regression (e.g. lm(var ~ trt + cov)).  However I only have four observations 
> per factor level. Is it still possible to apply a regression with such a 
> small sample size. I think that i should be difficult to correctly estimate 
> variance.Do you think that I rather should compute a non-parametric test such 
> as Kruskal-Wallis? However I need to include covariables in my models and I 
> am not sure if basic non-parametric tests are suitable for this. Thanks for 
> any suggestion.
> ___
> Mode, hifi, maison,… J'achète malin. Je compare les prix avec
>         [[alternative HTML version deleted]]
> 
> ___
> R-sig-ecology mailing list
> R-sig-ecology@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> 
> 

> -- 
> In God we trust, all others bring data. 

___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Martin Weiser
Hi,

coefficients and their p-values are reliable if your data are OK and you
do know enough about the process that generated them, so you can choose
appropriate model. With 4 points per line, it may be really difficult to
identify bad fit or outliers. 

For example: simple linear regression needs constant variance of the
normal distribution from which residuals are drawn -  along the
regression line - to work properly.  With 4 points, you can hardly
estimate this, but if you know enough about the process that generated
the data, you are safe. If you do not know, it is not easy to say
anything about the nature of the process that generated the data.

If you know (or can assume) that there is simple linear relationship,
you can say: "slope of this relationship is such and such", but if you
want to estimate both the nature of the relationship ("A *linearly*
depends on B") and its magnitude ("the slope of this relationship
is ..."), p-values would not help you much.

Of course, I may be wrong - I am not a statistician, just a user.

Best,
Martin W. 


V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> Thank you very much. If I get it right, the CI get wider, my test has less 
> power and the probability of getting a significant relation decreases. What 
> about the significant coefficients, are they reliable?
> 
> 
> 
> 
> > Message du 20/10/14 à 11h30
> > De : "Roman Luštrik" 
> > A : "V. Coudrain" 
> > Copie à : "r-sig-ecology@r-project.org" 
> > Objet : Re: [R-sig-eco] Regression with few observations per factor level
> > 
> > I think you can, but the confidence intervals will be rather large due to 
> > number of samples.
> > Notice how standard errors change for sample size (per group) from 4 to 30.
> > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> > > 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +   
> > >   trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = 
> > > pg), + cov = runif(pg*4)) # 4 groups> summary(lm(var 
> > > ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals: Min   1Q   Median   3Q  Max -1.63861 -0.46080  
> > 0.03332  0.66380  1.27974 
> > Coefficients:Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   1.2345 1.0218   1.2080.252trttrt2  -0.7759
> >  0.8667  -0.8950.390trttrt3   7.8503 0.8308   9.449  
> > 1.3e-06 ***trttrt4  28.2685 0.9050  31.236  4.3e-12 ***cov  
> >  1.4027 1.1639   1.2050.253---Signif. codes:  0 ‘***’ 0.001 
> > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.154 on 11 degrees of freedomMultiple R-squared:  
> > 0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and 11 DF,  
> > p-value: 7.467e-12
> > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> > > > 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + 
> > > > trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = 
> > > > pg), + cov = runif(pg*4)) # 4 groups> 
> > > > summary(lm(var ~ trt + cov, data = my.df))
> > Call:lm(formula = var ~ trt + cov, data = my.df)
> > Residuals:Min  1Q  Median  3Q Max -2.5778 -0.6584 -0.0185  
> > 0.6423  3.2077 
> > Coefficients:Estimate Std. Error t value Pr(>|t|)
> > (Intercept)  2.769610.25232  10.977  < 2e-16 ***trttrt2 -1.75490
> > 0.28546  -6.148 1.17e-08 ***trttrt3  8.405210.28251  29.752  < 
> > 2e-16 ***trttrt4 27.040950.28286  95.599  < 2e-16 ***cov  
> > 0.051290.32523   0.1580.875---Signif. codes:  0 ‘***’ 0.001 
> > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > Residual standard error: 1.094 on 115 degrees of freedomMultiple R-squared: 
> >  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and 115 DF,  
> > p-value: < 2.2e-16
> > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> > Hi, I would like to test the impact of a treatment of some variable using 
> > regression (e.g. lm(var ~ trt + cov)).  However I only have four 
> > observations per factor level. Is it still possible to apply a regression 
> > with such a small sample size. I think that i should be difficult to 
> > correctly estimate variance.Do you think that I rather should compute a 
> > non-parametric test such as Kruskal-Wallis? However I need to include 
> > covariables in my models and I am not sure if basic non-parametric tests 
> > are suitable for this. Thanks for any suggestion.
> > ___
> > Mode, hifi, maison,… J'achète malin. Je compare les prix avec
> > [[alternative HTML version deleted]]
> > 
> > ___
> > R-sig-ecology mailing list
> > R-sig-ecology@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
> > 
> > 
> 
> > -- 
> > In God we trust

[R-sig-eco] Logistic regression with 2 categorical predictors

2014-10-20 Thread Andrew Halford
Hi Listers,

I am trying to run a logistic regression to look at the effects of
experiment type and age on the behavior of fish in a choice chamber
experiment.

I am using the glm approach and would like some advice on how or whether to
perform contrasts to work out what levels of Factor1 (Age) and Factor 2
(Test) are significantly different from each other. I have not been able to
clarify from my reading what is the appropriate approach to take when
dealing with a significant interaction term. I am also not sure as to how
one interprets a model when all the coefficients are non-significant but
the chi-square ANOVA shows a highly significant interaction term.

I have graphed up the data as dot plots and there is definitely evidence of
changes in proportions in later ages.

I want to provide evidence for when and for which tests there was a
'significant' change in behavior.

> snapper2
   age test prefer avoid
11   LR 1514
21   SD 1513
31   SG 1714
41   SW 1414
52   LR 1714
62   SD 1619
72   SG 2010
82   SW 1521
93   LR 1016
10   3   SD 1410
11   3   SG 14 9
12   3   SW 1315
13   4   LR 1211
14   4   SD 1411
15   4   SG 1312
16   4   SW 1114
17   5   LR  412
18   5   SD  8 8
19   5   SG  018
20   5   SW 10 6
21   6   LR  0 6
22   6   SD  3 4
23   6   SG  0 5
24   6   SW  5 3

 >
dotplot(age~prefer/avoid,data=snapper2,group=snapper2$test,cex=1.5,pch=19,ylab="age",auto.key=list(space="right",title="Tests"))


> out2 <- glm(cbind(prefer,avoid) ~ age*test, data=snapper2,
family=binomial)

> summary(out2)

Call:
glm(formula = cbind(prefer, avoid) ~ age * test, family = binomial,
data = snapper2)

Deviance Residuals:
 [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Coefficients:
  Estimate Std. Error z value Pr(>|z|)
(Intercept)  6.899e-02  3.716e-01   0.186   0.8527
age2 1.252e-01  5.180e-01   0.242   0.8091
age3-5.390e-01  5.483e-01  -0.983   0.3256
age4 1.802e-02  5.589e-01   0.032   0.9743
age5-1.168e+00  6.866e-01  -1.701   0.0890 .
age6-2.575e+01  9.348e+04   0.000   0.9998
testSD   7.411e-02  5.307e-01   0.140   0.8890
testSG   1.252e-01  5.180e-01   0.242   0.8091
testSW  -6.899e-02  5.301e-01  -0.130   0.8964
age2:testSD -4.401e-01  7.260e-01  -0.606   0.5444
age3:testSD  7.324e-01  7.846e-01   0.933   0.3506
age4:testSD  8.004e-02  7.863e-01   0.102   0.9189
age5:testSD  1.024e+00  9.301e-01   1.102   0.2707
age6:testSD  2.532e+01  9.348e+04   0.000   0.9998
age2:testSG  3.738e-01  7.407e-01   0.505   0.6138
age3:testSG  7.867e-01  7.832e-01   1.004   0.3152
age4:testSG -1.321e-01  7.764e-01  -0.170   0.8649
age5:testSG -2.568e+01  8.768e+04   0.000   0.9998
age6:testSG  2.121e-02  1.334e+05   0.000   1.
age2:testSW -4.616e-01  7.249e-01  -0.637   0.5242
age3:testSW  3.959e-01  7.662e-01   0.517   0.6054
age4:testSW -2.592e-01  7.858e-01  -0.330   0.7415
age5:testSW  1.678e+00  9.386e-01   1.788   0.0737 .
age6:testSW  2.626e+01  9.348e+04   0.000   0.9998
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 5.4908e+01  on 23  degrees of freedom
Residual deviance: 2.6113e-10  on  0  degrees of freedom
AIC: 122.73

Number of Fisher Scoring iterations: 23


> anova(out2, test="Chisq")

Analysis of Deviance Table

Model: binomial, link: logit

Response: cbind(prefer, avoid)

Terms added sequentially (first to last)


 Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
NULL23 54.908
age   5   11.23518 43.673 0.0469115 *
test  31.59315 42.079 0.6608887
age:test 15   42.079 0  0.000 0.0002185 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

cheers

Andy

[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread stephen sefick
You are more or less preforming an ANOVA/ANCOVA on your data? As pointed
out earlier, all of the normal theory regression assumptions apply.
Assuming all of those things are satisfied then if you have large
confidence intervals and there are significant differences between groups I
don't see why you couldn't correctly infer something about the treatments.
Maybe I am missing something.

Stephen

On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser 
wrote:

> Hi,
>
> coefficients and their p-values are reliable if your data are OK and you
> do know enough about the process that generated them, so you can choose
> appropriate model. With 4 points per line, it may be really difficult to
> identify bad fit or outliers.
>
> For example: simple linear regression needs constant variance of the
> normal distribution from which residuals are drawn -  along the
> regression line - to work properly.  With 4 points, you can hardly
> estimate this, but if you know enough about the process that generated
> the data, you are safe. If you do not know, it is not easy to say
> anything about the nature of the process that generated the data.
>
> If you know (or can assume) that there is simple linear relationship,
> you can say: "slope of this relationship is such and such", but if you
> want to estimate both the nature of the relationship ("A *linearly*
> depends on B") and its magnitude ("the slope of this relationship
> is ..."), p-values would not help you much.
>
> Of course, I may be wrong - I am not a statistician, just a user.
>
> Best,
> Martin W.
>
>
> V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> > Thank you very much. If I get it right, the CI get wider, my test has
> less power and the probability of getting a significant relation decreases.
> What about the significant coefficients, are they reliable?
> >
> >
> >
> >
> > > Message du 20/10/14 à 11h30
> > > De : "Roman Luštrik"
> > > A : "V. Coudrain"
> > > Copie à : "r-sig-ecology@r-project.org"
> > > Objet : Re: [R-sig-eco] Regression with few observations per factor
> level
> > >
> > > I think you can, but the confidence intervals will be rather large due
> to number of samples.
> > > Notice how standard errors change for sample size (per group) from 4
> to 30.
> > > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean
> = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), +
>trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = pg),
> + cov = runif(pg*4)) # 4 groups> summary(lm(var ~ trt +
> cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals: Min   1Q   Median   3Q  Max -1.63861
> -0.46080  0.03332  0.66380  1.27974
> > > Coefficients:Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.2345 1.0218   1.2080.252trttrt2  -0.7759
>  0.8667  -0.8950.390trttrt3   7.8503 0.8308   9.449
> 1.3e-06 ***trttrt4  28.2685 0.9050  31.236  4.3e-12 ***cov
>  1.4027 1.1639   1.2050.253---Signif. codes:  0 ‘***’ 0.001
> ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.154 on 11 degrees of freedomMultiple
> R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 and
> 11 DF,  p-value: 7.467e-12
> > > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg,
> mean = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean =
> 30)), + trt = rep(c("trt1", "trt2", "trt3", "trt4"),
> each = pg), + cov = runif(pg*4)) # 4 groups>
> summary(lm(var ~ trt + cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:Min  1Q  Median  3Q Max -2.5778 -0.6584
> -0.0185  0.6423  3.2077
> > > Coefficients:Estimate Std. Error t value Pr(>|t|)
> (Intercept)  2.769610.25232  10.977  < 2e-16 ***trttrt2 -1.75490
> 0.28546  -6.148 1.17e-08 ***trttrt3  8.405210.28251  29.752  <
> 2e-16 ***trttrt4 27.040950.28286  95.599  < 2e-16 ***cov
> 0.051290.32523   0.1580.875---Signif. codes:  0 ‘***’ 0.001
> ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.094 on 115 degrees of freedomMultiple
> R-squared:  0.9913,Adjusted R-squared:  0.991 F-statistic:  3269 on 4 and
> 115 DF,  p-value: < 2.2e-16
> > > On Mon, Oct 20, 2014 at 10:53 AM, V. Coudrain  wrote:
> > > Hi, I would like to test the impact of a treatment of some variable
> using regression (e.g. lm(var ~ trt + cov)).  However I only have four
> observations per factor level. Is it still possible to apply a regression
> with such a small sample size. I think that i should be difficult to
> correctly estimate variance.Do you think that I rather should compute a
> non-parametric test such as Kruskal-Wallis? However I need to include
> covariables in my models and I am not sure if basic non-parametric tests
> are suitable for this. Thanks for any suggestion.
> > > 

Re: [R-sig-eco] Logistic regression with 2 categorical predictors

2014-10-20 Thread ONKELINX, Thierry
Dear Andrew,

anova() and summary() test different hypotheses. anova() tests is at least one 
level is different from the others. summary() tests if the coefficient is 
different from zero.

Multiple comparison of different interaction levels is probably the most 
relevant in this case. The easiest way is to make a new variable.

snapper2$inter <- with(snapper2, interaction(age, test))
model <- glm(cbind(prefer,avoid) ~ 0 + inter, data=snapper2, family=binomial)
library(multcomp)
mc <- glht(model, mcp(inter = "Tukey))
summary(mc)

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and 
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
thierry.onkel...@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than 
asking him to perform a post-mortem examination: he may be able to say what the 
experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure 
that a reasonable answer can be extracted from a given body of data.
~ John Tukey


-Oorspronkelijk bericht-
Van: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] Namens Andrew Halford
Verzonden: maandag 20 oktober 2014 16:06
Aan: r-sig-ecology@r-project.org
Onderwerp: [R-sig-eco] Logistic regression with 2 categorical predictors

Hi Listers,

I am trying to run a logistic regression to look at the effects of experiment 
type and age on the behavior of fish in a choice chamber experiment.

I am using the glm approach and would like some advice on how or whether to 
perform contrasts to work out what levels of Factor1 (Age) and Factor 2
(Test) are significantly different from each other. I have not been able to 
clarify from my reading what is the appropriate approach to take when dealing 
with a significant interaction term. I am also not sure as to how one 
interprets a model when all the coefficients are non-significant but the 
chi-square ANOVA shows a highly significant interaction term.

I have graphed up the data as dot plots and there is definitely evidence of 
changes in proportions in later ages.

I want to provide evidence for when and for which tests there was a 
'significant' change in behavior.

> snapper2
   age test prefer avoid
11   LR 1514
21   SD 1513
31   SG 1714
41   SW 1414
52   LR 1714
62   SD 1619
72   SG 2010
82   SW 1521
93   LR 1016
10   3   SD 1410
11   3   SG 14 9
12   3   SW 1315
13   4   LR 1211
14   4   SD 1411
15   4   SG 1312
16   4   SW 1114
17   5   LR  412
18   5   SD  8 8
19   5   SG  018
20   5   SW 10 6
21   6   LR  0 6
22   6   SD  3 4
23   6   SG  0 5
24   6   SW  5 3

 >
dotplot(age~prefer/avoid,data=snapper2,group=snapper2$test,cex=1.5,pch=19,ylab="age",auto.key=list(space="right",title="Tests"))


> out2 <- glm(cbind(prefer,avoid) ~ age*test, data=snapper2,
family=binomial)

> summary(out2)

Call:
glm(formula = cbind(prefer, avoid) ~ age * test, family = binomial,
data = snapper2)

Deviance Residuals:
 [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Coefficients:
  Estimate Std. Error z value Pr(>|z|)
(Intercept)  6.899e-02  3.716e-01   0.186   0.8527
age2 1.252e-01  5.180e-01   0.242   0.8091
age3-5.390e-01  5.483e-01  -0.983   0.3256
age4 1.802e-02  5.589e-01   0.032   0.9743
age5-1.168e+00  6.866e-01  -1.701   0.0890 .
age6-2.575e+01  9.348e+04   0.000   0.9998
testSD   7.411e-02  5.307e-01   0.140   0.8890
testSG   1.252e-01  5.180e-01   0.242   0.8091
testSW  -6.899e-02  5.301e-01  -0.130   0.8964
age2:testSD -4.401e-01  7.260e-01  -0.606   0.5444
age3:testSD  7.324e-01  7.846e-01   0.933   0.3506
age4:testSD  8.004e-02  7.863e-01   0.102   0.9189
age5:testSD  1.024e+00  9.301e-01   1.102   0.2707
age6:testSD  2.532e+01  9.348e+04   0.000   0.9998
age2:testSG  3.738e-01  7.407e-01   0.505   0.6138
age3:testSG  7.867e-01  7.832e-01   1.004   0.3152
age4:testSG -1.321e-01  7.764e-01  -0.170   0.8649
age5:testSG -2.568e+01  8.768e+04   0.000   0.9998
age6:testSG  2.121e-02  1.334e+05   0.000   1.
age2:testSW -4.616e-01  7.249e-01  -0.637   0.5242
age3:testSW  3.959e-01  7.662e-01   0.517   0.6054
age4:testSW -2.592e-01  7.858e-01  -0.330   0.7415
age5:testSW  1.678e+00  9.386e-01   1.788   0.0737 .
age6:testSW  2.626e+01  9.348e+04   0.000   0.9998
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 5.4908e+01  on 23  degrees of freedom Residual devi

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Thank you for this helpful thought. So if I get it correctly it is hopeless to 
try testing an interaction, but we neverless may assess if a covariate has an 
impact, providing it is the same in all treatments.




> Message du 20/10/14 à 16h46
> De : "Elgin Perry" 
> A : v_coudr...@voila.fr
> Copie à : 
> Objet : Regression with few observations per factor level
> 
> If it is reasonable to assume that the slope of the covariate is the same for 
> all treatments and you have numerous treatments then you can do this by 
> specifying one slope parameter for all treatments as you gave in your example 
> (e.g. lm(var ~ trt + cov)).  By combining slope information over treatments, 
> you can obtain a reasonably precise estimate.   With so few observations per 
> treatment, you will not be able to estimate separate slopes for each 
> treatment with any degree of precision (e.g. lm(var ~ trt + trt:cov))


Elgin S. Perry, Ph.D.
Statistics Consultant
377 Resolutions Rd.
Colonial Beach, Va.  22443
ph. 410.610.1473


Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
From: "V. Coudrain" < v_coudr...@voila.fr >
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Regression with few observations per factor level
Message-ID: < 2127199056.738451413795221981.JavaMail.www@wwinf7128 >
Content-Type: text/plain; charset="UTF-8"


Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).?
However I only have four observations per factor level. Is it still possible to 
apply a regression with such a small
sample size. I think that i should be difficult to correctly estimate 
variance.Do you think that I rather should compute
a non-parametric test such as Kruskal-Wallis? However I need to include 
covariables in my models and I am not sure if
basic non-parametric tests are suitable for this. Thanks for any suggestion.
___
Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec 
 [[alternative HTML version deleted]]
  


___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec 
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread V. Coudrain
Yes, but as I fear, the residuals behave badly as soon as the model get a 
little bit more complex (e.g., with two covariables or an interactions). The 
scope for performing an ANCOVA is thus very limited. That's why I was thinking 
about a potential non-parametric model. But I do not want to artificially makes 
my data tell something if it cannot.




> Message du 20/10/14 à 16h50
> De : "stephen sefick" 
> A : "Martin Weiser" 
> Copie à : "V. Coudrain" , "r-sig-ecology" 
> Objet : Re: [R-sig-eco] Regression with few observations per factor level
> 
> You are more or less preforming an ANOVA/ANCOVA on your data? As pointed out 
> earlier, all of the normal theory regression assumptions apply. Assuming all 
> of those things are satisfied then if you have large confidence intervals and 
> there are significant differences between groups I don't see why you couldn't 
> correctly infer something about the treatments. Maybe I am missing something.
> Stephen 
> On Mon, Oct 20, 2014 at 8:43 AM, Martin Weiser  wrote:
> Hi,
> 
> coefficients and their p-values are reliable if your data are OK and you
> do know enough about the process that generated them, so you can choose
> appropriate model. With 4 points per line, it may be really difficult to
> identify bad fit or outliers.
> 
> For example: simple linear regression needs constant variance of the
> normal distribution from which residuals are drawn -  along the
> regression line - to work properly.  With 4 points, you can hardly
> estimate this, but if you know enough about the process that generated
> the data, you are safe. If you do not know, it is not easy to say
> anything about the nature of the process that generated the data.
> 
> If you know (or can assume) that there is simple linear relationship,
> you can say: "slope of this relationship is such and such", but if you
> want to estimate both the nature of the relationship ("A *linearly*
> depends on B") and its magnitude ("the slope of this relationship
> is ..."), p-values would not help you much.
> 
> Of course, I may be wrong - I am not a statistician, just a user.
> 
> Best,
> Martin W.
> 
> 
> V. Coudrain píše v Po 20. 10. 2014 v 13:37 +0200:
> > Thank you very much. If I get it right, the CI get wider, my test has less 
> > power and the probability of getting a significant relation decreases. What 
> > about the significant coefficients, are they reliable?
> >
> >
> >
> >
> > > Message du 20/10/14 à 11h30
> > > De : "Roman Luštrik"
> > > A : "V. Coudrain"
> > > Copie à : "r-sig-ecology@r-project.org"
> > > Objet : Re: [R-sig-eco] Regression with few observations per factor level
> > >
> > > I think you can, but the confidence intervals will be rather large due to 
> > > number of samples.
> > > Notice how standard errors change for sample size (per group) from 4 to 
> > > 30.
> > > > pg <- 4 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean = 
> > > > 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 30)), + 
> > > >                     trt = rep(c("trt1", "trt2", "trt3", "trt4"), each = 
> > > > pg), +                     cov = runif(pg*4)) # 4 groups> 
> > > > summary(lm(var ~ trt + cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:     Min       1Q   Median       3Q      Max -1.63861 -0.46080  
> > > 0.03332  0.66380  1.27974
> > > Coefficients:            Estimate Std. Error t value Pr(>|t|)    
> > > (Intercept)   1.2345     1.0218   1.208    0.252    trttrt2      -0.7759  
> > >    0.8667  -0.895    0.390    trttrt3       7.8503     0.8308   9.449  
> > > 1.3e-06 ***trttrt4      28.2685     0.9050  31.236  4.3e-12 ***cov        
> > >    1.4027     1.1639   1.205    0.253    ---Signif. codes:  0 ‘***’ 0.001 
> > > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.154 on 11 degrees of freedomMultiple 
> > > R-squared:  0.9932,Adjusted R-squared:  0.9908 F-statistic: 404.4 on 4 
> > > and 11 DF,  p-value: 7.467e-12
> > > > > pg <- 30 # pg = per group> my.df <- data.frame(var = c(rnorm(pg, mean 
> > > > > = 3), rnorm(pg, mean = 1), rnorm(pg, mean = 11), rnorm(pg, mean = 
> > > > > 30)), +                     trt = rep(c("trt1", "trt2", "trt3", 
> > > > > "trt4"), each = pg), +                     cov = runif(pg*4)) # 4 
> > > > > groups> summary(lm(var ~ trt + cov, data = my.df))
> > > Call:lm(formula = var ~ trt + cov, data = my.df)
> > > Residuals:    Min      1Q  Median      3Q     Max -2.5778 -0.6584 -0.0185 
> > >  0.6423  3.2077
> > > Coefficients:            Estimate Std. Error t value Pr(>|t|)    
> > > (Intercept)  2.76961    0.25232  10.977  < 2e-16 ***trttrt2     -1.75490  
> > >   0.28546  -6.148 1.17e-08 ***trttrt3      8.40521    0.28251  29.752  < 
> > > 2e-16 ***trttrt4     27.04095    0.28286  95.599  < 2e-16 ***cov          
> > > 0.05129    0.32523   0.158    0.875    ---Signif. codes:  0 ‘***’ 0.001 
> > > ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > > Residual standard error: 1.094 on 115 degree

Re: [R-sig-eco] Regression with few observations per factor level

2014-10-20 Thread Baldwin, Jim -FS
Yes, the analysis with a small sample size would be valid (under the assumption 
that the model - both fixed and random effects are correctly specified) but at 
some point there must be a practical assessment as to the desired precision and 
the costs of the consequences of either inadequate estimates or wrong 
acceptance or rejection of hypotheses.  If it were just about the numbers from 
a sample and resulting P-values, we would only need statisticians and no 
subject-matter experts (which is clearly not the case).

And while I'm soapboxing, situations with low variability require fewer samples 
than situations with high variability.  One can't make assessments of the 
adequacy of an analysis solely on the sample size.

Jim

Jim Baldwin
Station Statistician
Pacific Southwest Research Station
USDA Forest Service

-Original Message-
From: r-sig-ecology-boun...@r-project.org 
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of V. Coudrain
Sent: Monday, October 20, 2014 8:54 AM
To: ElginPerry
Cc: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] Regression with few observations per factor level

Thank you for this helpful thought. So if I get it correctly it is hopeless to 
try testing an interaction, but we neverless may assess if a covariate has an 
impact, providing it is the same in all treatments.




> Message du 20/10/14 à 16h46
> De : "Elgin Perry"
> A : v_coudr...@voila.fr
> Copie à :
> Objet : Regression with few observations per factor level
>
> If it is reasonable to assume that the slope of the covariate is the
> same for all treatments and you have numerous treatments then you can
> do this by specifying one slope parameter for all treatments as you
> gave in your example (e.g. lm(var ~ trt + cov)).  By combining slope
> information over treatments, you can obtain a reasonably precise
> estimate.   With so few observations per treatment, you will not be
> able to estimate separate slopes for each treatment with any degree of
> precision (e.g. lm(var ~ trt + trt:cov))


Elgin S. Perry, Ph.D.
Statistics Consultant
377 Resolutions Rd.
Colonial Beach, Va.  22443
ph. 410.610.1473


Date: Mon, 20 Oct 2014 10:53:41 +0200 (CEST)
From: "V. Coudrain" < v_coudr...@voila.fr >
To: r-sig-ecology@r-project.org
Subject: [R-sig-eco] Regression with few observations per factor level
Message-ID: < 2127199056.738451413795221981.JavaMail.www@wwinf7128 >
Content-Type: text/plain; charset="UTF-8"


Hi, I would like to test the impact of a treatment of some variable using 
regression (e.g. lm(var ~ trt + cov)).?
However I only have four observations per factor level. Is it still possible to 
apply a regression with such a small sample size. I think that i should be 
difficult to correctly estimate variance.Do you think that I rather should 
compute a non-parametric test such as Kruskal-Wallis? However I need to include 
covariables in my models and I am not sure if basic non-parametric tests are 
suitable for this. Thanks for any suggestion.
___
Mode, hifi, maison,? J'ach?te malin. Je compare les prix avec
 [[alternative HTML version deleted]]



___
Mode, hifi, maison,… J'achète malin. Je compare les prix avec
[[alternative HTML version deleted]]

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology




This electronic message contains information generated by the USDA solely for 
the intended recipients. Any unauthorized interception of this message or the 
use or disclosure of the information it contains may violate the law and 
subject the violator to civil or criminal penalties. If you believe you have 
received this message in error, please notify the sender and delete the email 
immediately.
___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology