[R] Reporting binomial logistic regression from R results

2018-11-11 Thread Frodo Jedi
Dear list members,
I need some help in understanding whether I am doing correctly a binomial
logistic regression and whether I am interpreting the results in the
correct way. Also I would need an advice regarding the reporting of the
results from the R functions.

I want to report the results of a binomial logistic regression where I want
to assess difference between the 3 levels of a factor (called System) on
the dependent variable (called Response) taking two values, 0 and 1. My
goal is to understand if the effect of the 3 systems (A,B,C) in System
affect differently Response in a significant way. I am basing my analysis
on this URL: https://stats.idre.ucla.edu/r/dae/logit-regression/

This is the result of my analysis:

> fit <- glm(Response ~ System, data = scrd, family = "binomial")
> summary(fit)

Call:
glm(formula = Response ~ System, family = "binomial", data = scrd)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-2.8840   0.1775   0.2712   0.2712   0.5008

Coefficients:
 Estimate Std. Error z value Pr(>|z|)
(Intercept)3.2844 0.2825  11.626  < 2e-16 ***
SystemB  -1.2715 0.3379  -3.763 0.000168 ***
SystemC0.8588 0.4990   1.721 0.085266 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 411.26  on 1023  degrees of freedom
Residual deviance: 376.76  on 1021  degrees of freedom
AIC: 382.76

Number of Fisher Scoring iterations: 6
Following this analysis I perform the wald test in order to understand
whether there is an overall effect of System:

library(aod)

> wald.test(b = coef(fit), Sigma = vcov(fit), Terms = 1:3)
Wald test:
--

Chi-squared test:
X2 = 354.6, df = 3, P(> X2) = 0.0
The chi-squared test statistic of 354.6, with 3 degrees of freedom is
associated with a p-value < 0.001 indicating that the overall effect of
System is statistically significant.

Now I check whether there are differences between the coefficients using
again the wald test:

# Here difference between system B and C:

> l <- cbind(0, 1, -1)
> wald.test(b = coef(fit), Sigma = vcov(fit), L = l)
Wald test:
--

Chi-squared test:
X2 = 22.3, df = 1, P(> X2) = 2.3e-06



# Here difference between system A and C:

> l <- cbind(1, 0, -1)
> wald.test(b = coef(fit), Sigma = vcov(fit), L = l)
Wald test:
--

Chi-squared test:
X2 = 12.0, df = 1, P(> X2) = 0.00052



# Here difference between system A and B:

> l <- cbind(1, -1, 0)
> wald.test(b = coef(fit), Sigma = vcov(fit), L = l)
Wald test:
--

Chi-squared test:
X2 = 58.7, df = 1, P(> X2) = 1.8e-14

My understanding is that from this analysis I can state that the three
systems lead to a significantly different Response. Am I right? If so, how
should I report the results of this analysis? What is the correct way?


Thanks in advance

Best wishes

FJ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reporting binomial logistic regression from R results

2018-11-12 Thread Frodo Jedi
Dear Petr,
thank you very much for your feedback.

Can anyone in the list advise me if the way I report the results is correct?

Kind regards

FJ


On Mon, Nov 12, 2018 at 1:02 PM PIKAL Petr  wrote:

> Hi Frodo
>
>
>
> I do not consider myself as an arbiter in statistical results and their
> presentation. Again your text seems to as good as any other.
>
>
>
> You should keep responses to mailing list as others could have another
> opinion.
>
>
>
> Cheers
>
> Petr
>
>
>
>
>
> *From:* Frodo Jedi 
> *Sent:* Monday, November 12, 2018 1:48 PM
> *To:* PIKAL Petr 
> *Subject:* Re: [R] Reporting binomial logistic regression from R results
>
>
>
> Dear Petr,
>
> many thanks for your reply. I was wondering whether in your opinion it is
> correct to report in a journal the following text:
>
>
>
>
>
> “A logistic regression was performed to ascertain the effects of the
> system type on the likelihood that participants report correct
> identifications. The logistic regression model was statistically
> significant, χ2(3) = 354.6, p < 0.001, indicating an overall effect of the
> system type on participants' identification performances. The Wald test was
> used to compare the model coefficients related to the three systems.
> Results showed that participants’ accuracy was significantly lower for the
> system B compared to both the system C (χ2(1) = 22.3, p < 0.001) and the
> system A (χ2(1) = 58.7, p < 0.001), as well as that the system C led to
> significantly higher identification accuracies than the system A (χ2(1) =
> 12, p < 0.001).”
>
>
>
>
>
> Best wishes
>
>
>
> FJ
>
>
>
>
>
>
>
>
>
>
>
> On Mon, Nov 12, 2018 at 10:05 AM PIKAL Petr 
> wrote:
>
> Dear Frodo (or Jedi)
>
> The results seems to confirm your assumption that 3 systems are different.
> How you should present results probably depends on how it is usual to
> report such results in your environment.
>
> BTW. It seems to me like homework and this list has no homework policy
> (Sorry, if I am mistaken).
>
> Cheers
> Petr
> > -Original Message-
> > From: R-help  On Behalf Of Frodo Jedi
> > Sent: Monday, November 12, 2018 2:08 AM
> > To: r-help@r-project.org
> > Subject: [R] Reporting binomial logistic regression from R results
> >
> > Dear list members,
> > I need some help in understanding whether I am doing correctly a binomial
> > logistic regression and whether I am interpreting the results in the
> correct way.
> > Also I would need an advice regarding the reporting of the results from
> the R
> > functions.
> >
> > I want to report the results of a binomial logistic regression where I
> want to
> > assess difference between the 3 levels of a factor (called System) on the
> > dependent variable (called Response) taking two values, 0 and 1. My goal
> is to
> > understand if the effect of the 3 systems (A,B,C) in System affect
> differently
> > Response in a significant way. I am basing my analysis on this URL:
> > https://stats.idre.ucla.edu/r/dae/logit-regression/
> >
> > This is the result of my analysis:
> >
> > > fit <- glm(Response ~ System, data = scrd, family = "binomial")
> > > summary(fit)
> >
> > Call:
> > glm(formula = Response ~ System, family = "binomial", data = scrd)
> >
> > Deviance Residuals:
> > Min   1Q   Median   3Q  Max
> > -2.8840   0.1775   0.2712   0.2712   0.5008
> >
> > Coefficients:
> >  Estimate Std. Error z value Pr(>|z|)
> > (Intercept)3.2844 0.2825  11.626  < 2e-16 ***
> > SystemB  -1.2715 0.3379  -3.763 0.000168 ***
> > SystemC0.8588 0.4990   1.721 0.085266 .
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > (Dispersion parameter for binomial family taken to be 1)
> >
> > Null deviance: 411.26  on 1023  degrees of freedom Residual deviance:
> > 376.76  on 1021  degrees of freedom
> > AIC: 382.76
> >
> > Number of Fisher Scoring iterations: 6
> > Following this analysis I perform the wald test in order to understand
> whether
> > there is an overall effect of System:
> >
> > library(aod)
> >
> > > wald.test(b = coef(fit), Sigma = vcov(fit), Terms = 1:3)
> > Wald test:
> > --
> >
> > Chi-squared test:
> > X2 = 354.6, df = 3, P(> X2) = 0.0
> > The chi-squared test statistic of 354.6, with 3 degrees of freedom is
> associated
> > with a p-value < 0.001 i

Re: [R] Reporting binomial logistic regression from R results

2018-11-12 Thread Frodo Jedi
Dear Peter and Eik,
I am very grateful to you for your replies.
My current understanding is that from the GLM analysis I can indeed
conclude that the response predicted by System A is significantly different
from that of System B, while the pairwise comparison A vs C leads to non
significance. Now the Wald test seems to be correct only for Systems B vs
C, indicating that the pairwise System B vs System C is significant. Am I
correct?

However, my current understanding is also that I should use contrasts
instead of the wald test. So the default contrasts is with the System A,
now I should re-perform the GLM with another base. I tried to use the
option "contrasts" of the glm:

> fit1 <- glm(Response ~ System, data = scrd, family = "binomial",
contrasts = contr.treatment(3, base=1,contrasts=TRUE))
> summary(fit1)

> fit2 <- glm(Response ~ System, data = scrd, family = "binomial",
contrasts = contr.treatment(3, base=2,contrasts=TRUE))
> summary(fit2)

> fit3 <- glm(Response ~ System, data = scrd, family = "binomial",
contrasts = contr.treatment(3, base=3,contrasts=TRUE))
> summary(fit3)

However, the output of these three summary functions are identical. Why?
That option should have changed the base, but apparently this is not the
case.


Another analysis I found online (at this link
https://stats.stackexchange.com/questions/60352/comparing-levels-of-factors-after-a-glm-in-r
)
to understand the differences between the 3 levels is to use glth with
Tuckey. I performed the following:

> library(multcomp)
> summary(glht(fit, mcp(System="Tukey")))

Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: glm(formula = Response ~ System, family = "binomial", data = scrd)

Linear Hypotheses:
  Estimate Std. Error z value Pr(>|z|)
B - A == 0  -1.2715 0.3379  -3.763 0.000445 ***
C - A == 00.8588 0.4990   1.721 0.192472
C - B == 0 2.1303 0.4512   4.722  < 1e-04 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Adjusted p values reported -- single-step method)


Is this Tukey analysis correct?


I am a bit confused on what analysis I should do. I am doing my very best
to study all resources I can find, but I would really need some help from
experts, especially in using R.


Best wishes

FJ






On Mon, Nov 12, 2018 at 1:46 PM peter dalgaard  wrote:

> Yes, only one of the pairwise comparisons (B vs. C) is right. Also, the
> overall test has 3 degrees of freedom whereas a comparison of 3 groups
> should have 2. You (meaning Frodo) are testing that _all 3_ regression
> coefficients are zero, intercept included. That would imply that all three
> systems have response probablilities og 0.5, which is not likely what you
> want.
>
> This all suggests that you are struggling with the interpretation of the
> regression coefficients and their role in the linear predictor. This should
> be covered by any good book on logistic regression.
>
> -pd
>
> > On 12 Nov 2018, at 14:15 , Eik Vettorazzi  wrote:
> >
> > Dear Jedi,
> > please use the source carefully. A and C are not statistically different
> at the 5% level, which can be inferred from glm output. Your last two
> wald.tests don't test what you want to, since your model contains an
> intercept term. You specified contrasts which tests A vs B-A, ie A-
> (B-A)==0 <-> 2*A-B==0 which is not intended I think. Have a look at
> ?contr.treatment and re-read your source doc to get an idea what dummy
> coding and indicatr variables are about.
> >
> > Cheers
> >
> >
> > Am 12.11.2018 um 02:07 schrieb Frodo Jedi:
> >> Dear list members,
> >> I need some help in understanding whether I am doing correctly a
> binomial
> >> logistic regression and whether I am interpreting the results in the
> >> correct way. Also I would need an advice regarding the reporting of the
> >> results from the R functions.
> >> I want to report the results of a binomial logistic regression where I
> want
> >> to assess difference between the 3 levels of a factor (called System) on
> >> the dependent variable (called Response) taking two values, 0 and 1. My
> >> goal is to understand if the effect of the 3 systems (A,B,C) in System
> >> affect differently Response in a significant way. I am basing my
> analysis
> >> on this URL: https://stats.idre.ucla.edu/r/dae/logit-regression/
> >> This is the result of my analysis:
> >>> fit <- glm(Response ~ System, data = scrd, family = "binomial")
> >>> summary(fit)
> >> Call:
> >> glm(formula = Response ~ System, family = "binomial", data = scrd)
> >> Deviance

Re: [R] Reporting binomial logistic regression from R results

2018-11-12 Thread Frodo Jedi
Dear Bert,
I understand and thanks for your recommendation. Unfortunately I do not
have any possibility to contact a statistical expert at the moment. So this
forum experts' recommendation would be crucial to me to understand how R
works in relation to my question.
I hope that someone could reply to my last questions.

Best regards

FJ

On Mon, Nov 12, 2018 at 7:48 PM Bert Gunter  wrote:

> Generally speaking, this list is about questions on R programming, not
> statistical issues. However, I grant you that your queries are in something
> of a gray area intersecting both.
>
> Nevertheless, based on your admitted confusion, I would recommend that you
> find a local statistical expert with whom you can consult 1-1 if at all
> possible. As others have already noted, you statistical understanding is
> muddy, and it can be quite difficult to resolve such confusion in online
> forums like this that cannot provide the close back and forth that may be
> required (as well as further appropriate study).
>
> Best,
> Bert
>
> On Mon, Nov 12, 2018 at 11:09 AM Frodo Jedi <
> frodojedi.mailingl...@gmail.com> wrote:
>
>> Dear Peter and Eik,
>> I am very grateful to you for your replies.
>> My current understanding is that from the GLM analysis I can indeed
>> conclude that the response predicted by System A is significantly
>> different
>> from that of System B, while the pairwise comparison A vs C leads to non
>> significance. Now the Wald test seems to be correct only for Systems B vs
>> C, indicating that the pairwise System B vs System C is significant. Am I
>> correct?
>>
>> However, my current understanding is also that I should use contrasts
>> instead of the wald test. So the default contrasts is with the System A,
>> now I should re-perform the GLM with another base. I tried to use the
>> option "contrasts" of the glm:
>>
>> > fit1 <- glm(Response ~ System, data = scrd, family = "binomial",
>> contrasts = contr.treatment(3, base=1,contrasts=TRUE))
>> > summary(fit1)
>>
>> > fit2 <- glm(Response ~ System, data = scrd, family = "binomial",
>> contrasts = contr.treatment(3, base=2,contrasts=TRUE))
>> > summary(fit2)
>>
>> > fit3 <- glm(Response ~ System, data = scrd, family = "binomial",
>> contrasts = contr.treatment(3, base=3,contrasts=TRUE))
>> > summary(fit3)
>>
>> However, the output of these three summary functions are identical. Why?
>> That option should have changed the base, but apparently this is not the
>> case.
>>
>>
>> Another analysis I found online (at this link
>>
>> https://stats.stackexchange.com/questions/60352/comparing-levels-of-factors-after-a-glm-in-r
>> )
>> to understand the differences between the 3 levels is to use glth with
>> Tuckey. I performed the following:
>>
>> > library(multcomp)
>> > summary(glht(fit, mcp(System="Tukey")))
>>
>> Simultaneous Tests for General Linear Hypotheses
>>
>> Multiple Comparisons of Means: Tukey Contrasts
>>
>>
>> Fit: glm(formula = Response ~ System, family = "binomial", data = scrd)
>>
>> Linear Hypotheses:
>>   Estimate Std. Error z value Pr(>|z|)
>> B - A == 0  -1.2715 0.3379  -3.763 0.000445 ***
>> C - A == 00.8588 0.4990   1.721 0.192472
>> C - B == 0 2.1303 0.4512   4.722  < 1e-04 ***
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> (Adjusted p values reported -- single-step method)
>>
>>
>> Is this Tukey analysis correct?
>>
>>
>> I am a bit confused on what analysis I should do. I am doing my very best
>> to study all resources I can find, but I would really need some help from
>> experts, especially in using R.
>>
>>
>> Best wishes
>>
>> FJ
>>
>>
>>
>>
>>
>>
>> On Mon, Nov 12, 2018 at 1:46 PM peter dalgaard  wrote:
>>
>> > Yes, only one of the pairwise comparisons (B vs. C) is right. Also, the
>> > overall test has 3 degrees of freedom whereas a comparison of 3 groups
>> > should have 2. You (meaning Frodo) are testing that _all 3_ regression
>> > coefficients are zero, intercept included. That would imply that all
>> three
>> > systems have response probablilities og 0.5, which is not likely what
>> you
>> > want.
>> >
>> > This all suggests that you are struggling with the interpretation of the
>> > regression coefficients and their role in the linear predictor

[R] t-test or ANOVA...who wins? Help please!

2011-01-04 Thread Frodo Jedi
Dear all,

I need an help because I don´t know how to perform the analysis in the right 
way, as I get different beheaviors using t-test and two ways ANOVA.
In what follow I post the table, my goal and the strange results I got. 

I kindly ask you an help because I really don´t know how to solve this problem.

So the table is this:

number  stimulus condition response
1 flat_550_W_realism A3
2 flat_550_W_realism A3
3 flat_550_W_realism A5
4 flat_550_W_realism A3
5 flat_550_W_realism A3
6 flat_550_W_realism A3
7 flat_550_W_realism A3
8 flat_550_W_realism A5
9 flat_550_W_realism A3
10flat_550_W_realism A3
11flat_550_W_realism A5
12flat_550_W_realism A7
13flat_550_W_realism A5
14flat_550_W_realism A2
15flat_550_W_realism A3
16flat_550_W_realismAH7
17flat_550_W_realismAH4
18flat_550_W_realismAH5
19flat_550_W_realismAH3
20flat_550_W_realismAH6
21flat_550_W_realismAH5
22flat_550_W_realismAH3
23flat_550_W_realismAH5
24flat_550_W_realismAH5
25flat_550_W_realismAH7
26flat_550_W_realismAH2
27flat_550_W_realismAH7
28flat_550_W_realismAH5
29flat_550_W_realismAH5
30 bump_2_step_W_realism A1
31 bump_2_step_W_realism A3
32 bump_2_step_W_realism A5
33 bump_2_step_W_realism A1
34 bump_2_step_W_realism A3
35 bump_2_step_W_realism A2
36 bump_2_step_W_realism A5
37 bump_2_step_W_realism A4
38 bump_2_step_W_realism A4
39 bump_2_step_W_realism A4
40 bump_2_step_W_realism A4
41 bump_2_step_W_realismAH3
42 bump_2_step_W_realismAH5
43 bump_2_step_W_realismAH1
44 bump_2_step_W_realismAH5
45 bump_2_step_W_realismAH4
46 bump_2_step_W_realismAH4
47 bump_2_step_W_realismAH5
48 bump_2_step_W_realismAH4
49 bump_2_step_W_realismAH3
50 bump_2_step_W_realismAH4
51 bump_2_step_W_realismAH5
52 bump_2_step_W_realismAH4
53 hole_2_step_W_realism A3
54 hole_2_step_W_realism A3
55 hole_2_step_W_realism A4
56 hole_2_step_W_realism A1
57 hole_2_step_W_realism A4
58 hole_2_step_W_realism A3
59 hole_2_step_W_realism A5
60 hole_2_step_W_realism A4
61 hole_2_step_W_realism A3
62 hole_2_step_W_realism A4
63 hole_2_step_W_realism A7
64 hole_2_step_W_realism A5
65 hole_2_step_W_realism A1
66 hole_2_step_W_realism A4
67 hole_2_step_W_realismAH7
68 hole_2_step_W_realismAH5
69 hole_2_step_W_realismAH5
70 hole_2_step_W_realismAH1
71 hole_2_step_W_realismAH5
72 hole_2_step_W_realismAH5
73 hole_2_step_W_realismAH5
74 hole_2_step_W_realismAH2
75 hole_2_step_W_realismAH6
76 hole_2_step_W_realismAH5
77 hole_2_step_W_realismAH5
78 hole_2_step_W_realismAH6
79 bump_2_heel_toe_W_realism A3
80 bump_2_heel_toe_W_realism A3
81 bump_2_heel_toe_W_realism A3
82 bump_2_heel_toe_W_realism A2
83 bump_2_heel_toe_W_realism A3
84 bump_2_heel_toe_W_realism A3
85 bump_2_heel_toe_W_realism A4
86 bump_2_heel_toe_W_realism A3
87 bump_2_heel_toe_W_realism A4
88 bump_2_heel_toe_W_realism A4
89 bump_2_heel_toe_W_realism  

Re: [R] t-test or ANOVA...who wins? Help please!

2011-01-05 Thread Frodo Jedi
Dear Tal Galili,
thanks a lot for your answer! I agree with you, the t-test is comparing 2 
conditions at one level of stimulus, while the ANOVA table is testing the 
significance of the interaction between condition and stimulsthe two tests 
are testing two different things.

But still I don´t understand which is the right way to perform the analysis in 
order to solve my problem. 

Let´s consider now only the table I posted before. 
The same stimuli in the table have been presented to subjects in two 
conditions: 
A and AH, where AH is the condition A plus something elese (let´s call it "H"). 
 
I want to know if AT GLOBAL  LEVEL adding "H" bring to better results in the 
participants evaluations of the stimuli rather than the stimulus presented only 
with condition "A". 

Data in column "response" are evaluation on realism of the stimulus from a 7 
point scale.

If I calculate the mean for each stmulus in each condition, the results show 
that for each stimulus the AH condition is always greater than the first. 
Anyway, doing a t-test to compare the stimuli by couple (es. flat_550_W_realism 
in condition A,  flat_550_W_realism in condition AH) I get that only sometimes 
the differences are statistically significant.  I ask you if there is a way to 
say that condition AH is better than condition A, at global level. 



In attachment you find the table in .txt and also in .csv format. Is it 
possible 
for you to make an example in R, including also the R results
in order to tell me what to see in the console to see if my problem is solved 
or 
not?

For example, I was checking in the anova results the stimulus:conditon 
line.but I don´t know if my anova analysis was correct or
not.


I am not an expert of R, nor of statistics ;-( 
Anyway I am doing my best to study and understand.

Please enlighten me.

Thanks in advance

Best regards




________
From: Tal Galili 
To: Frodo Jedi 
Cc: r-help@r-project.org
Sent: Wed, January 5, 2011 10:15:41 AM
Subject: Re: [R] t-test or ANOVA...who wins? Help please!


Hello Frodo,

It is not clear to me from your questions some of the basics of your analysis.

If you only have two levels of a factor, and one response - why in the anova do 
you use more factors (and their interactions)?
In that sense, it is obvious that your results would differ from the t-test.

In either case, I am not sure if any of these methods are valid since your data 
doesn't seem to be normal.
Here is an example code of how to get the same results from aov and t.test.  
And 
also a nonparametric option (that might be more fitting)



flat_550_W_realism =c(3,3,5,3,3,3,3,5,3,3,5,7,5,2,3)
flat_550_W_realism_AH =c(7,4,5,3,6,5,3,5,5,7,2,7,5, 5)

x <- c(rep(1, length(flat_550_W_realism)),
rep(2, length(flat_550_W_realism_AH)))

y <- c(flat_550_W_realism , flat_550_W_realism_AH)

# equal results between t test and anova
t.test(y ~ x, var.equal= T)
summary(aov(y ~ x))

# plotting the data:
boxplot(y ~ x) # group 1 is not at all symetrical...
wilcox.test(y ~ x) # a more fitting test



Contact 
Details:---
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | 
www.r-statistics.com (English)
------






On Wed, Jan 5, 2011 at 12:37 AM, Frodo Jedi  wrote:


>I kindly ask you an help because I really don´t know how to solve this problem.
>



  number  stimulus condition response
1 flat_550_W_realism A3
2 flat_550_W_realism A3
3 flat_550_W_realism A5
4 flat_550_W_realism A3
5 flat_550_W_realism A3
6 flat_550_W_realism A3
7 flat_550_W_realism A3
8 flat_550_W_realism A5
9 flat_550_W_realism A3
10flat_550_W_realism A3
11flat_550_W_realism A5
12flat_550_W_realism A7
13flat_550_W_realism A5
14flat_550_W_realism A2
15flat_550_W_realism A3
16flat_550_W_realismAH7
17flat_550_W_realismAH4
18flat_550_W_realismAH5
19flat_550_W_realismAH3
20flat_550_W_realismAH6
21flat_550_W_realismAH5
22flat_550_W_realismAH3
23flat_550_W_realismAH5
24flat_550_W_realismAH5
25flat_550_W_realismAH   

[R] Assumptions for ANOVA: the right way to check the normality

2011-01-05 Thread Frodo Jedi
Dear all,
I would like to know which is the right way to check the normality  assumption 
for performing ANOVA. How do you check normality for the  following example?

I did an experiment where people had to evaluate on a 7 point scale, the  
degree 
of realism of some stimuli presented in 2 conditions.

The problem is that if I check normality with the Shapiro test I get that the 
data are not normally distributed.

Someone suggested me that I don´t have to check the normality of the  data, but 
the normality of the residuals I get after the fitting of the  linear model.
I really ask you to help me to understand this point as I don´t find enough 
material online where to solve it.

If the data are not normally distributed I have to use the kruskal wallys test 
and not the ANOVA...so please help
me to understand.

I make a numerical example, could you please tell me if the data in this table 
are normally distributed or not?

Help!


number  stimulus condition response
1 flat_550_W_realism A3
2 flat_550_W_realism A3
3 flat_550_W_realism A5
4 flat_550_W_realism A3
5 flat_550_W_realism A3
6 flat_550_W_realism A3
7 flat_550_W_realism A3
8 flat_550_W_realism A5
9 flat_550_W_realism A3
10flat_550_W_realism A3
11flat_550_W_realism A5
12flat_550_W_realism A7
13flat_550_W_realism A5
14flat_550_W_realism A2
15flat_550_W_realism A3
16flat_550_W_realismAH7
17flat_550_W_realismAH4
18flat_550_W_realismAH5
19flat_550_W_realismAH3
20flat_550_W_realismAH6
21flat_550_W_realismAH5
22flat_550_W_realismAH3
23flat_550_W_realismAH5
24flat_550_W_realismAH5
25flat_550_W_realismAH7
26flat_550_W_realismAH2
27flat_550_W_realismAH7
28flat_550_W_realismAH5
29flat_550_W_realismAH5
30 bump_2_step_W_realism A1
31 bump_2_step_W_realism A3
32 bump_2_step_W_realism A5
33 bump_2_step_W_realism A1
34 bump_2_step_W_realism A3
35 bump_2_step_W_realism A2
36 bump_2_step_W_realism A5
37 bump_2_step_W_realism A4
38 bump_2_step_W_realism A4
39 bump_2_step_W_realism A4
40 bump_2_step_W_realism A4
41 bump_2_step_W_realismAH3
42 bump_2_step_W_realismAH5
43 bump_2_step_W_realismAH1
44 bump_2_step_W_realismAH5
45 bump_2_step_W_realismAH4
46 bump_2_step_W_realismAH4
47 bump_2_step_W_realismAH5
48 bump_2_step_W_realismAH4
49 bump_2_step_W_realismAH3
50 bump_2_step_W_realismAH4
51 bump_2_step_W_realismAH5
52 bump_2_step_W_realismAH4
53 hole_2_step_W_realism A3
54 hole_2_step_W_realism A3
55 hole_2_step_W_realism A4
56 hole_2_step_W_realism A1
57 hole_2_step_W_realism A4
58 hole_2_step_W_realism A3
59 hole_2_step_W_realism A5
60 hole_2_step_W_realism A4
61 hole_2_step_W_realism A3
62 hole_2_step_W_realism A4
63 hole_2_step_W_realism A7
64 hole_2_step_W_realism A5
65 hole_2_step_W_realism A1
66 hole_2_step_W_realism A4
67 hole_2_step_W_realismAH7
68 hole_2_step_W_realismAH5
69 hole_2_step_W_realismAH5
70 hole_2_step_W_realismAH1
71 hole_2_step_W_realismAH5
72 hole_2_step_W_realismAH5
73 hole_2_step_W_realismAH5
74 hole_2_step_W_realismAH2
75 hole_2_step_W_realismAH6
76 hole_2_step_W_realismAH5
77 hole_2_step_W_realismAH 

[R] Comparing fitting models

2011-01-05 Thread Frodo Jedi

Dear all,
I have 3 models (from simple to complex) and I want to compare them in order to 
see if they fit equally well or not.
From the R prompt I am not able to see where I can get this information.
Let´s do an example:

fit1<- lm(response ~ stimulus + condition + stimulus:condition,  data=scrd) 
#EQUIVALE A lm(response ~ stimulus*condition, data=scrd) 


fit2<- lm(response ~ stimulus + condition, data=scrd) 

fit3<- lm(response ~ condition, data=scrd) 


> anova(fit2, fit1) #compare models 
Analysis of Variance Table

Model 1: response ~ stimulus + condition
Model 2: response ~ stimulus + condition + stimulus:condition
  Res.DfRSS Df Sum of Sq  F Pr(>F)
1165 364.13   
2159 362.67  61.4650 0.1071 0.9955


> anova(fit3, fit2, fit1) #compare models 
Analysis of Variance Table

Model 1: response ~ condition
Model 2: response ~ stimulus + condition
Model 3: response ~ stimulus + condition + stimulus:condition
  Res.DfRSS Df Sum of Sq  F Pr(>F)
1171 382.78   
2165 364.13  618.650 1.3628 0.2328
3159 362.67  6 1.465 0.1071 0.9955



How can I understand that the simple model fits as good as the complex model 
(the one with the interaction)?

Thanks in advance

All the best


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Comparing fitting models

2011-01-06 Thread Frodo Jedi
Dear Greg,
thanks for your answer, but that is not the point.

My question is another: from the data in the R prompt how
can I compare the models?
I don´t understand the output of anova(fit1, fit2, fit3)

What I have to look to understand something about the comparison of the models?

Look the output in R:
> anova(fit3, fit2, fit1) #compare models 
Analysis of Variance Table

Model 1: response ~ condition
Model 2: response ~ stimulus + condition
Model 3: response ~ stimulus + condition + stimulus:condition
  Res.DfRSS Df Sum of Sq  F Pr(>F)
1171 382.78   
2165 364.13  618.650 1.3628 0.2328
3159 362.67  6 1.465 0.1071 0.9955

I don´t understand these data...what they say actually about the comparison 
between the three models?

Please enlighten me.


Thanks a lot





From: Greg Snow 


Sent: Wed, January 5, 2011 10:34:15 PM
Subject: RE: [R] Comparing fitting models

Just do anova(fit3, fit1)

This compares those 2 models directly.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Frodo Jedi
> Sent: Wednesday, January 05, 2011 10:10 AM
> To: r-help@r-project.org
> Subject: [R] Comparing fitting models
> 
> 
> Dear all,
> I have 3 models (from simple to complex) and I want to compare them in
> order to
> see if they fit equally well or not.
> From the R prompt I am not able to see where I can get this
> information.
> Let´s do an example:
> 
> fit1<- lm(response ~ stimulus + condition + stimulus:condition,
> data=scrd)
> #EQUIVALE A lm(response ~ stimulus*condition, data=scrd)
> 
> 
> fit2<- lm(response ~ stimulus + condition, data=scrd)
> 
> fit3<- lm(response ~ condition, data=scrd)
> 
> 
> > anova(fit2, fit1) #compare models
> Analysis of Variance Table
> 
> Model 1: response ~ stimulus + condition
> Model 2: response ~ stimulus + condition + stimulus:condition
>   Res.DfRSS Df Sum of Sq  F Pr(>F)
> 1165 364.13
> 2159 362.67  61.4650 0.1071 0.9955
> 
> 
> > anova(fit3, fit2, fit1) #compare models
> Analysis of Variance Table
> 
> Model 1: response ~ condition
> Model 2: response ~ stimulus + condition
> Model 3: response ~ stimulus + condition + stimulus:condition
>   Res.DfRSS Df Sum of Sq  F Pr(>F)
> 1171 382.78
> 2165 364.13  618.650 1.3628 0.2328
> 3159 362.67  6 1.465 0.1071 0.9955
> 
> 
> 
> How can I understand that the simple model fits as good as the complex
> model
> (the one with the interaction)?
> 
> Thanks in advance
> 
> All the best
> 
> 
> 
> [[alternative HTML version deleted]]


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-06 Thread Frodo Jedi
Dear Robert,
thanks so much!!!  Now I understand!
So you also think that I have to check only the residuals and not the data 
directly.
Now just for curiosity I did the the shapiro test on the residuals. The problem 
is that on fit3 I don´t get from the test
that the data are normally distribuited. Why? Here the data:

> shapiro.test(residuals(fit1))

Shapiro-Wilk normality test

data:  residuals(fit1) 
W = 0.9848, p-value = 0.05693

#Here the test is ok: the test says that the data are distributed normally 
(p-value greather than 0.05)



> shapiro.test(residuals(fit2))

Shapiro-Wilk normality test

data:  residuals(fit2) 
W = 0.9853, p-value = 0.06525

#Here the test is ok: the test says that the data are distributed normally 
(p-value greather than 0.05)



> shapiro.test(residuals(fit3))

Shapiro-Wilk normality test

data:  residuals(fit3) 
W = 0.9621, p-value = 0.0001206



Now the test reveals p-value lower than 0.05: so the residuals for fit3 are not 
distributed normally
Why I get this beheaviour? Indeed in the histogram and Q-Q plot for fit3 
residuals I get a normal distribution.

















From: Robert Baer 

Sent: Wed, January 5, 2011 8:56:50 PM
Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality

> Someone suggested me that I don´t have to check the normality of the data, but
> the normality of the residuals I get after the fitting of the  linear model.
> I really ask you to help me to understand this point as I don´t find enough
> material online where to solve it.

Try the following:
# using your scrd data and your proposed models
fit1<- lm(response ~ stimulus + condition + stimulus:condition, data=scrd)
fit2<- lm(response ~ stimulus + condition, data=scrd)
fit3<- lm(response ~ condition, data=scrd)

# Set up for 6 plots on 1 panel
op = par(mfrow=c(2,3))

# residuals function extracts residuals
# Visual inspection is a good start for checking normality
# You get a much better feel than from some "magic number" statistic
hist(residuals(fit1))
hist(residuals(fit2))
hist(residuals(fit3))

# especially qqnorm() plots which are linear for normal data
qqnorm(residuals(fit1))
qqnorm(residuals(fit2))
qqnorm(residuals(fit3))

# Restore plot parameters
par(op)

> 
> If the data are not normally distributed I have to use the kruskal wallys test
> and not the ANOVA...so please help
> me to understand.

Indeed - Kruskal-Wallis is a good test to use for one factor data that is 
ordinal so it is a good alternative to your fit3.
Your "response" seems to be a discrete variable rather than a continuous 
variable.
You must decide if it is reasonable to approximate it with a normal 
distribution 
which is by definition continuous.

> 
> I make a numerical example, could you please tell me if the data in this table
> are normally distributed or not?
> 
> Help!
> 
> 
> number  stimulus condition response
> 1 flat_550_W_realism A3
> 2 flat_550_W_realism A3
> 3 flat_550_W_realism A5
> 4 flat_550_W_realism A3
> 5 flat_550_W_realism A3
> 6 flat_550_W_realism A3
> 7 flat_550_W_realism A3
> 8 flat_550_W_realism A5
> 9 flat_550_W_realism A3
> 10flat_550_W_realism A3
> 11flat_550_W_realism A5
> 12flat_550_W_realism A7
> 13flat_550_W_realism A5
> 14flat_550_W_realism A2
> 15flat_550_W_realism A3
> 16flat_550_W_realismAH7
> 17flat_550_W_realismAH4
> 18flat_550_W_realismAH5
> 19flat_550_W_realismAH3
> 20flat_550_W_realismAH6
> 21flat_550_W_realismAH5
> 22flat_550_W_realismAH3
> 23flat_550_W_realismAH5
> 24flat_550_W_realismAH5
> 25flat_550_W_realismAH7
> 26flat_550_W_realismAH2
> 27flat_550_W_realismAH7
> 28flat_550_W_realismAH5
> 29flat_550_W_realismAH5
> 30 bump_2_step_W_realism A1
> 31 bump_2_step_W_realism A3
> 32 bump_2_step_W_realism A5
> 33 bump_2_step_W_realism A1
> 34 bump_2_step_W_realism A3
> 35 bump_2_step_W_realism A2
> 36 bump_2_step_W_realism A5
> 37 bump_2_step_W_realism A4
> 38 bump_2_step_W_realism A4
> 3

Re: [R] Comparing fitting models

2011-01-06 Thread Frodo Jedi
Dear Greg,
thanks so much. Now I understand!!

Thanks!!





From: Greg Snow 


Sent: Wed, January 5, 2011 11:27:53 PM
Subject: RE: [R] Comparing fitting models


The output with all three fits gives you 2 comparisons, fit1 vs. fit2 and fit2 
vs. fit3.

So using an alpha of 0.05, the 0.99 p-value is comparing model 2 (fit2) and
model 3 (fit1) and testing the null that they fit equally well with the 
differences being due to random chance.  The p-value is large so we cannot
reject that fit2 fits as well as fit1, so the interaction is not significantly 
different.  


Then the 0.23 p-value compares the models with and without stimulus (the 
interaction not being considered), again the p-value is larger than alpha so we 
don’t have the evidence to conclude that stimulus helps.

If you want to test stimulus and the interaction in a single step, then do
anova(fit3, fit1), giving the anova does the sequential tests.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


Sent: Wednesday, January 05, 2011 2:46 PM
To: Greg Snow; r-help@r-project.org
Subject: Re: [R] Comparing fitting models

Dear Greg,
thanks for your answer, but that is not the point.

My question is another: from the data in the R prompt how
can I compare the models?
I don´t understand the output of anova(fit1, fit2, fit3)

What I have to look to understand something about the comparison of the models?

Look the output in R:
> anova(fit3, fit2, fit1) #compare models 
Analysis of Variance Table

Model 1: response ~ condition
Model 2: response ~ stimulus + condition
Model 3: response ~ stimulus + condition + stimulus:condition
  Res.DfRSS Df Sum of Sq  F Pr(>F)
1171 382.78   
2165 364.13  618.650 1.3628 0.2328
3159 362.67  6 1.465 0.1071 0.9955

I don´t understand these data...what they say actually about the comparison 
between the three models?

Please enlighten me.

Thanks a lot




From:Greg Snow 


Sent: Wed, January 5, 2011 10:34:15 PM
Subject: RE: [R] Comparing fitting models

Just do anova(fit3, fit1)

This compares those 2 models directly.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Frodo Jedi
> Sent: Wednesday, January 05, 2011 10:10 AM
> To: r-help@r-project.org
> Subject: [R] Comparing fitting models
> 
> 
> Dear all,
> I have 3 models (from simple to complex) and I want to compare them in
> order to
> see if they fit equally well or not.
> From the R prompt I am not able to see where I can get this
> information.
> Let´s do an example:
> 
> fit1<- lm(response ~ stimulus + condition + stimulus:condition,
> data=scrd)
> #EQUIVALE A lm(response ~ stimulus*condition, data=scrd)
> 
> 
> fit2<- lm(response ~ stimulus + condition, data=scrd)
> 
> fit3<- lm(response ~ condition, data=scrd)
> 
> 
> > anova(fit2, fit1) #compare models
> Analysis of Variance Table
> 
> Model 1: response ~ stimulus + condition
> Model 2: response ~ stimulus + condition + stimulus:condition
>  Res.DfRSS Df Sum of Sq  F Pr(>F)
> 1165 364.13
> 2159 362.67  61.4650 0.1071 0.9955
> 
> 
> > anova(fit3, fit2, fit1) #compare models
> Analysis of Variance Table
> 
> Model 1: response ~ condition
> Model 2: response ~ stimulus + condition
> Model 3: response ~ stimulus + condition + stimulus:condition
>  Res.DfRSS Df Sum of Sq  F Pr(>F)
> 1171 382.78
> 2165 364.13  618.650 1.3628 0.2328
> 3159 362.67  61.465 0.1071 0.9955
> 
> 
> 
> How can I understand that the simple model fits as good as the complex
> model
> (the one with the interaction)?
> 
> Thanks in advance
> 
> All the best
> 
> 
> 
> [[alternative HTML version deleted]]


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem with 2-ways ANOVA interactions

2011-01-06 Thread Frodo Jedi
Dear All, 
I have a problem in understanding how the interactions of 2 ways ANOVA work, 
because I get conflicting results
from a t-test and an anova. For most of you my problem is very simple I am sure.

I need an help with an example, looking at one table I am analyzing. The table 
is in attachment
and can be imported in R by means of this command: 
scrd<- 
read.table('/Users/luca/Documents/Analisi_passi/Codice_R/Statistics_results_bump_hole_Audio_Haptic/tables_for_R/table_realism_wood.txt',
 header=TRUE, colClasse=c('numeric','factor','factor','numeric')) 


This table is the result of a simple experiment. Subjects where exposed to some 
stimuli and they where asked to evaluate the degree of realism
of the stimuli on a 7 point scale (i.e., data in column "response").
Each stimulus was presented in two conditions, "A" and "AH", where AH is the 
condition A plus another thing (let´s call it "H").

Now, what means exactly in my table the interaction stimulus:condition?

I think that if I do the analysis anova(response ~ stimulus*condition) I will 
get the comparison between 

the same stimulus in condition A and in condition AH. Am I wrong?

For instance the comparison of stimulus flat_550_W_realism presented in 
condition A with the same stimulus, flat_550_W_realism, 

presented in condition AH.

The problem is that if I do a t-test between the values of this stimulus in the 
A and AH condition I get significative difference,
while if I do the test with 2-ways ANOVA I don´t get any difference. 
How is this possible?

Here I put the results analysis


#Here the result of ANOVA:
> fit1<- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) 
>#EQUIVALE A lm(response ~ stimulus*condition, data=scrd) 
>
> anova(fit1)
Analysis of Variance Table

Response: response
Df Sum Sq Mean Sq F value   Pr(>F)
stimulus 6  15.05   2.509  1.1000   0.3647
condition1  36.51  36.515 16.0089 9.64e-05 ***
stimulus:condition   6   1.47   0.244  0.1071   0.9955
Residuals  159 362.67   2.281 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 


#As you can see the p-value for stimulus:condition is high.


#Now I do the t-test with the same values of the table concerning the stimulus 
presented in A and AH conditions:

flat_550_W_realism =c(3,3,5,3,3,3,3,5,3,3,5,7,5,2,3) 
flat_550_W_realism_AH   =c(7,4,5,3,6,5,3,5,5,7,2,7,5,   5) 

> t.test(flat_550_W_realism,flat_550_W_realism_AH, var.equal=TRUE) 

Two Sample t-test 

data:  flat_550_W_realism and flat_550_W_realism_AH 
t = -2.2361, df = 27, p-value = 0.03381 
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval: 
-2.29198603 -0.09849016 
sample estimates: 
mean of x mean of y 
3.73  4.928571 


#Now we have a significative difference between these two stimuli (p-value = 
0.03381) 



Why I get this beheaviour?


Moreover, how by means of ANOVA I could track the  significative differences 
between the stimuli presented in A and AH  condition
whitout doing the t-test? 

Please help!

Thanks in advance


  number  stimulus condition response
1 flat_550_W_realism A3
2 flat_550_W_realism A3
3 flat_550_W_realism A5
4 flat_550_W_realism A3
5 flat_550_W_realism A3
6 flat_550_W_realism A3
7 flat_550_W_realism A3
8 flat_550_W_realism A5
9 flat_550_W_realism A3
10flat_550_W_realism A3
11flat_550_W_realism A5
12flat_550_W_realism A7
13flat_550_W_realism A5
14flat_550_W_realism A2
15flat_550_W_realism A3
16flat_550_W_realismAH7
17flat_550_W_realismAH4
18flat_550_W_realismAH5
19flat_550_W_realismAH3
20flat_550_W_realismAH6
21flat_550_W_realismAH5
22flat_550_W_realismAH3
23flat_550_W_realismAH5
24flat_550_W_realismAH5
25flat_550_W_realismAH7
26flat_550_W_realismAH2
27flat_550_W_realismAH7
28flat_550_W_realismAH5
29flat_550_W_realismAH5
30 bump_2_step_W_realism A1
31 bump_2_step_W_realism A3
32 bump_2_step_W_realism A5
33 bump_2_step_W_realism A1
34 bump_2_step_W_realism   

Re: [R] Problem with 2-ways ANOVA interactions

2011-01-06 Thread Frodo Jedi
Dear Greg,
thanks so much, I think that now I have understood. Please confirm me this 
reading what follows ;-)

To summarize from the beginning, the table I analyzed is the result of a simple 
experiment. Subjects where exposed  to some stimuli 

and they where asked to evaluate the degree of realism of the stimuli on a 7 
point scale (i.e., data in column "response").
Each stimulus was presented in two conditions, "A" and "AH", where AH is the 
condition A plus another thing (let´s call it "H").

Before I wrongly thought that if I do the analysis anova(response ~ 
stimulus*condition) I would have got the comparison between 

the same stimulus in condition A and in condition AH (e.g. stimulus_1_A, 
stimulus_1_AH). 

Instad, apparently, the interaction stimulus:condition means that I find the 
differences between the stimuli keeping fixed the condition!!
If this is true then doing the anova with the interaction stimulus:condition is 
equivalent to do the ONE WAY ANOVA  first on 

the subset where all the conditions are A and then on the subset where all the 
conditions are AH? Right?


So if all before is correct, my final question is: how by means of ANOVA can I 
track the  significative differences between the stimuli 

presented in A and AH  condition whitout passing for the t-test? Indeed my goal 
was to find in one hand if globally the condition
AH bring to better results than condition A, and on the other hand I needed to 
know for which stimuli the condition AH brings
better results than condition A.



Finally, Iam burning with curiosity to know the answers to the following two 
questions:
1-  is there a difference between anova(response ~ stimulus*condition) and 
anova(response ~ condition*stimulus)
concerning the interaction part?

2-  doing the anova(response ~ stimulus + condition) give the same results of 
two ONE WAY ANOVA
anova(response ~ stimulus) and anova(response ~ condition) but the advantage is 
that they are presented together in one single output?



Looking forward to knowing your response!


Best regards




From: Greg Snow 


Sent: Thu, January 6, 2011 6:41:49 PM
Subject: RE: [R] Problem with 2-ways ANOVA interactions

You really need to spend more time with a good aov textbook and probably a
consultant that can explain things to you face to face.  But here is a basic 
explanation to get you pointed in the right direction:

Consider a simple 2x2 example with factors A and B each with 2 levels (1 and 
2).  Draw a 2x2 grid to represent this, there are 4 groups and the theory would 
be that they have means mu11, mu12, mu21, and mu22 (mu12 is for the group with 
A 
at level 1 and B at level 2, etc.).  


Now you fit the full model with 2 main effects and 1 interaction, if we assume 
treatment contrasts (the default in R, the coefficients/tests will be different 
for different contrasts, but the general idea is the same) then the 
intercept/mean/constant piece will correspond to mu11; the coefficient (only 
seen if treated as lm instead of aov object) for testing A will be (mu21-mu11) 
and for testing B will be (mu12-m11).  


Now the interaction piece gets a bit more complex, it is (mu11 - mu12 - mu21 + 
mu22), this makes a bit more sense if we rearrange it to be one of ( 
(mu22-mu21) 
- (mu12-mu11) ) or ( (mu22-mu12) - (mu21-mu11) );  it represents the difference 
in the differences, i.e. we find how much going from A1 to A2 changes things 
when B is 1, then we find how much going from A1 to A2 changes things when B is 
2, then we find the difference in these changes, that is the interaction (and 
if 
it is 0, then the effects of A and B are additive and independent, i.e. the
amount A changes things does not depend on the value of B and vis versa).

So testing the interaction term is asking if how much a change in A affects
things depends on the value of B.

This is very different from comparing mu11 to mu12 (or mu21 to mu22) which is 
what I think you did in the t-test, it is asking a very different question and 
using different base assumptions (ignoring any effect of B, additional data, 
etc.).  Note that your test on condition is very significant, this would be 
more 
similar to your t-test, but still not match exactly because of the differences.

Now your case is more complicated since stimulus has 7 levels (6 df), so the 
interaction is a combination of 6 different differences of differences, which 
is 
why you need to spend some time in a good textbook/class to really understand 
what model(s) you are fitting.


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Frodo Jedi
> Sent: Wednesday, January 05, 2011 4:10 PM
> To: r-help@r-project.org
> Subject: [R] Problem with 2-ways ANOVA interactions

Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-06 Thread Frodo Jedi


Ok,
I see ;-)

Let´s put in this way then. When do I have to use the kruskal wallis test? I 
mean, when I am very sure that I have 

to use it instead of ANOVA?

Thanks


Best regards

P.S.  In addition, which is the non parametric methods corresponding to a 2 
ways 
anova?..or have I to
repeat many times the kruskal wallis test?




From: Greg Snow 

"r-help@r-project.org" 
Sent: Thu, January 6, 2011 7:07:17 PM
Subject: RE: [R] Assumptions for ANOVA: the right way to check the normality

Remember that an non-significant result (especially one that is still near 
alpha 
like yours) does not give evidence that the null is true.  The reason that the 
1st 2 tests below don't show significance is more due to lack of power than 
some 
of the residuals being normal.  The only test that I would trust for this is 
SnowsPenultimateNormalityTest (TeachingDemos package, the help page is more
useful than the function itself).

But I think that you are mixing up 2 different concepts (a very common 
misunderstanding).  What is important if we want to do normal theory inference 
is that the coefficients/effects/estimates are normally distributed.  Now since 
these coefficients can be shown to be linear combinations of the error terms, 
if 
the errors are iid normal then the coefficients are also normally distributed.  
So many people want to show that the residuals come from a perfectly normal
distribution.  But it is the theoretical errors, not the observed residuals 
that 
are important (the observed residuals are not iid).  You need to think about 
the 
source of your data to see if this is a reasonable assumption.  Now I cannot 
fathom any universe (theoretical or real) in which normally distributed errors 
added to means that they are independent of will result in a finite set of
integers, so an assumption of exact normality is not reasonable (some may want 
to argue this, but convincing me will be very difficult).  But looking for 
exact 
normality is a bit of a red herring because, we also have the Central Limit
Theorem that says that if the errors are not normal (but still iid) then the 
distribution of the coefficients will approach normality as the sample size
increases.  This is what make statistics doable (because no real dataset 
entered 
into the computer is exactly normal).  The more important question is are the 
residuals "normal enough"?  for which there is not a definitive test 
(experience 
and plots help).

But this all depends on another assumption that I don't think that you have 
even 
considered.  Yes we can use normal theory even when the random part of the data 
is not normally distributed, but this still assumes that the data is at least 
interval data, i.e. that we firmly believe that the difference between a 
response of 1 and a response of 2 is exactly the same as a difference between a 
6 and a 7 and that the difference from 4 to 6 is exactly twice that of 1 vs. 2. 
 
>From your data and other descriptions, I don't think that that is a reasonable 
assumption.  If you are not willing to make that assumption (like me) then 
means 
and normal theory tests are meaningless and you should use other approaches.  
One possibility is to use non-parametric methods (which I believe Frank has
already suggested you use), another is to use proportional odds logistic 
regression.



--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Frodo Jedi
> Sent: Wednesday, January 05, 2011 3:22 PM
> To: Robert Baer; r-help@r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
>
> Dear Robert,
[[elided Yahoo spam]]
> So you also think that I have to check only the residuals and not the
> data
> directly.
> Now just for curiosity I did the the shapiro test on the residuals. The
> problem
> is that on fit3 I don´t get from the test
> that the data are normally distribuited. Why? Here the data:
>
> > shapiro.test(residuals(fit1))
>
> Shapiro-Wilk normality test
>
> data:  residuals(fit1)
> W = 0.9848, p-value = 0.05693
>
> #Here the test is ok: the test says that the data are distributed
> normally
> (p-value greather than 0.05)
>
>
>
> > shapiro.test(residuals(fit2))
>
> Shapiro-Wilk normality test
>
> data:  residuals(fit2)
> W = 0.9853, p-value = 0.06525
>
> #Here the test is ok: the test says that the data are distributed
> normally
> (p-value greather than 0.05)
>
>
>
> > shapiro.test(residuals(fit3))
>
> Shapiro-Wilk normality test
>
> data:  residuals(fit3)
> W = 0.9621, p-value = 0.0001206
>
>
>
> Now the test reveal

Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-06 Thread Frodo Jedi
nal Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-
> project.org] On Behalf Of Frodo Jedi
> Sent: Wednesday, January 05, 2011 3:22 PM
> To: Robert Baer; r-help@r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
>
> Dear Robert,
[[elided Yahoo spam]]
> So you also think that I have to check only the residuals and not the
> data
> directly.
> Now just for curiosity I did the the shapiro test on the residuals. The
> problem
> is that on fit3 I don´t get from the test
> that the data are normally distribuited. Why? Here the data:
>
> > shapiro.test(residuals(fit1))
>
>Shapiro-Wilk normality test
>
> data:  residuals(fit1)
> W = 0.9848, p-value = 0.05693
>
> #Here the test is ok: the test says that the data are distributed
> normally
> (p-value greather than 0.05)
>
>
>
> > shapiro.test(residuals(fit2))
>
>Shapiro-Wilk normality test
>
> data:  residuals(fit2)
> W = 0.9853, p-value = 0.06525
>
> #Here the test is ok: the test says that the data are distributed
> normally
> (p-value greather than 0.05)
>
>
>
> > shapiro.test(residuals(fit3))
>
>Shapiro-Wilk normality test
>
> data:  residuals(fit3)
> W = 0.9621, p-value = 0.0001206
>
>
>
> Now the test reveals p-value lower than 0.05: so the residuals for fit3
> are not
> distributed normally
> Why I get this beheaviour? Indeed in the histogram and Q-Q plot for
> fit3
> residuals I get a normal distribution.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> 
> From: Robert Baer 
>
> Sent: Wed, January 5, 2011 8:56:50 PM
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
>
> > Someone suggested me that I don´t have to check the normality of the
> data, but
> > the normality of the residuals I get after the fitting of the  linear
> model.
> > I really ask you to help me to understand this point as I don´t find
> enough
> > material online where to solve it.
>
> Try the following:
> # using your scrd data and your proposed models
> fit1<- lm(response ~ stimulus + condition + stimulus:condition,
> data=scrd)
> fit2<- lm(response ~ stimulus + condition, data=scrd)
> fit3<- lm(response ~ condition, data=scrd)
>
> # Set up for 6 plots on 1 panel
> op = par(mfrow=c(2,3))
>
> # residuals function extracts residuals
> # Visual inspection is a good start for checking normality
> # You get a much better feel than from some "magic number" statistic
> hist(residuals(fit1))
> hist(residuals(fit2))
> hist(residuals(fit3))
>
> # especially qqnorm() plots which are linear for normal data
> qqnorm(residuals(fit1))
> qqnorm(residuals(fit2))
> qqnorm(residuals(fit3))
>
> # Restore plot parameters
> par(op)
>
> >
> > If the data are not normally distributed I have to use the kruskal
> wallys test
> > and not the ANOVA...so please help
> > me to understand.
>
> Indeed - Kruskal-Wallis is a good test to use for one factor data that
> is
> ordinal so it is a good alternative to your fit3.
> Your "response" seems to be a discrete variable rather than a
> continuous
> variable.
> You must decide if it is reasonable to approximate it with a normal
> distribution
> which is by definition continuous.
>
> >
> > I make a numerical example, could you please tell me if the data in
> this table
> > are normally distributed or not?
> >
> > Help!
> >
> >
> > number  stimulus condition response
> > 1flat_550_W_realismA3
> > 2flat_550_W_realismA3
> > 3flat_550_W_realismA5
> > 4flat_550_W_realismA3
> > 5flat_550_W_realismA3
> > 6flat_550_W_realismA3
> > 7flat_550_W_realismA3
> > 8flat_550_W_realismA5
> > 9flat_550_W_realismA3
> > 10flat_550_W_realismA3
> > 11flat_550_W_realismA5
> > 12flat_550_W_realismA7
> > 13flat_550_W_realismA5
> > 14flat_550_W_realismA2
> > 15flat_550_W_realismA3
> > 16flat_550_W_realismAH7
> > 17flat_550_W_realismAH4
> > 18flat_550_W_realismAH5
&g

[R] anova vs aov commands for anova with repeated measures

2011-01-07 Thread Frodo Jedi
Dear all,
I need to understand a thing in the beheaviour of the two functions aov and 
anova in the following case
involving an analysis of ANOVA with repeated measures:

If I use the folowing command I don´t get any problem:

>aov1 = aov(response ~ stimulus*condition + 
>Error(subject/(stimulus*condition)), 
>data=scrd)
> summary(aov1)

Instead if I try to fit the same model for the regression I get an error:
> fit1<- lm(response ~ stimulus*condition + 
> Error(subject/(stimulus*condition)), 
>data=scrd) 
>
Error in eval(expr, envir, enclos) : could not find function "Error"

so I cannot run the command anova(fit1) afterwards.


I want to use fit1<- lm(response ~ stimulus*condition + 
Error(subject/(stimulus*condition)), data=scrd) 

because I want to analyze the residuals in order to check normality, and see if 
the anova assumption of normality
still holds.

Could you please help me in understanding how to do this?

Thanks in advance

Best regards


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] anova vs aov commands for anova with repeated measures

2011-01-08 Thread Frodo Jedi
Dear Bill,
many thanks for your answer. I got the point but still I would need an help
with a numeric example, as I haven´t fully understood how it works with R
(I had a look to proj() but I am not aware of how can be used correctly).

Could you please make an example?

For example, in attachment you find a .csv table resulting from an experiment, 
you can access it by means of this command:

> scrd<-  
>read.csv(file='/Users/../tables_for_R/table_quality_wood.csv',sep=',',header=T)
>

The data are from an experiment where participants had to evaluate on a seven 
point likert scale
the realism of some stimuli, which are presented both in condition "A" and in 
condition "AH".

I need to perform the ANOVA by means of this command:

> aov1 = aov(response ~ stimulus*condition + 
> Error(subject/(stimulus*condition)), 
>data=scrd)


but the problem is that I cannot plot as usually do the qqnorm on the residuals 
of the fit because
lm does not support the Error term present in aov.
I normally check normality through a plot (or a shapiro.test function). Now 
could you please 

illustrate me how will you be able to undestand from my data if they are 
normally distributed?


Please enlighten me

Best regards






From: "bill.venab...@csiro.au" 
To: frodo.j...@yahoo.com; r-help@r-project.org
Sent: Sat, January 8, 2011 4:01:41 AM
Subject: RE: [R] anova vs aov commands for anova with repeated measures

lm() and aov() are not fully equivalent.  They both fit linear models, but they 
use different algorighms, and this allows aov, for example, to handle some 
simple multistratum models.  The algorithm used by lm does not allow this, but 
it has other advantages for simpler models.

If you want to fit a multistratum model, such as a repeated measures model, you 
need to use aov.

When it comes to finding the residuals, you have to be explicit about which 
residuals you mean, too.  You get residuals for each stratum in a multistratum 
model.  Using plain old resid() will not work as that function can only be used 
when there is only one kind of residuals vector defined.  (it could me made to 
do something sensible, but that's another issue.  Right now, it doesn't.)

The function proj() can be used on a fitted model object to obtain the 
projections, at each stratum, on to the subspaces defined by the terms in the 
model, and includes the residuals (the projection onto the orthogonal 
complement 
of the model space in R^n) as the final column of the matrix of projections.  
These are easy to dig out and you can analyse away.

See ?proj for an example of its use, but you will need to dig out the 
appropriate column yourself.


From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Frodo Jedi [frodo.j...@yahoo.com]
Sent: 08 January 2011 01:51
To: r-help@r-project.org
Subject: [R] anova vs aov commands for anova with repeated measures

Dear all,
I need to understand a thing in the beheaviour of the two functions aov and
anova in the following case
involving an analysis of ANOVA with repeated measures:

If I use the folowing command I don´t get any problem:

>aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)),
>data=scrd)
> summary(aov1)

Instead if I try to fit the same model for the regression I get an error:
> fit1<- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)),
>data=scrd)
>
Error in eval(expr, envir, enclos) : could not find function "Error"

so I cannot run the command anova(fit1) afterwards.


I want to use fit1<- lm(response ~ stimulus*condition +
Error(subject/(stimulus*condition)), data=scrd)

because I want to analyse the residuals in order to check normality, and see if
the anova assumption of normality
still holds.

Could you please help me in understanding how to do this?

Thanks in advance

Best regards



[[alternative HTML version deleted]]


  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-08 Thread Frodo Jedi
ned to P, resampling and confidence intervals, and looking at sums
> of
> squares terms add to explanation of the model. I've tried the plot()
> function to help graphically evaluate a model, and I want to make sure
> I
> understand what it's showing me. I think the first, is showing me the
> models fitted values vs the residuals, and ideally, I think the closer
> the points are to the red line the better. The next plot is a Q-Q plot,
> the closer the points to the line, the more normal the model
> coefficients (or perhaps the data). I'm not sure what the next two
> plots
> are, but it is titled Scale-Location. And it looks to have the square
> root of standardized residuals on y, and fitted model values on x.
> Might
> this be similar to the first plot? The final one is titled Residuals vs
> Leverage, which has standardized residuals on y and leverage on x, and
> something called Cooks Distance is plotted as well.
>
> Thanks,
> Ben. W
> > Whether to use anova and other normality based tests is really a
> matter of what assumptions you are willing to live with and what level
> of "close enough" you are comfortable with.  Consulting with a local
> consultant with experience in these areas is useful if you don't have
> enough experience to decide what you are comfortable with.
> >
> > For your description, I would try the proportional odds logistic
> regression, but again, you should probably consult with someone who has
> experience rather than trying that on your own until you have more
> training and experience.
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org
> > 801.408.8111
> >
> > From: Frodo Jedi [mailto:frodo.j...@yahoo.com]
> > Sent: Thursday, January 06, 2011 12:57 PM
> > To: Greg Snow; r-help@r-project.org
> > Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> normality
> >
> >
> > Ok,
> > I see ;-)
> >
> > Let´s put in this way then. When do I have to use the kruskal wallis
> test? I mean, when I am very sure that I have
> > to use it instead of ANOVA?
> >
> > Thanks
> >
> >
> > Best regards
> >
> > P.S.  In addition, which is the non parametric methods corresponding
> to a 2 ways anova?..or have I to
> > repeat many times the kruskal wallis test?
> > 
> > From: Greg Snow
> > To: Frodo Jedi; Robert Baer;
> "r-help@r-project.org"
> > Sent: Thu, January 6, 2011 7:07:17 PM
> > Subject: RE: [R] Assumptions for ANOVA: the right way to check the
> normality
> >
> > Remember that an non-significant result (especially one that is still
> near alpha like yours) does not give evidence that the null is true.
> The reason that the 1st 2 tests below don't show significance is more
> due to lack of power than some of the residuals being normal.  The only
> test that I would trust for this is SnowsPenultimateNormalityTest
> (TeachingDemos package, the help page is more useful than the function
> itself).
> >
> > But I think that you are mixing up 2 different concepts (a very
> common misunderstanding).  What is important if we want to do normal
> theory inference is that the coefficients/effects/estimates are
> normally distributed.  Now since these coefficients can be shown to be
> linear combinations of the error terms, if the errors are iid normal
> then the coefficients are also normally distributed.  So many people
> want to show that the residuals come from a perfectly normal
> distribution.  But it is the theoretical errors, not the observed
> residuals that are important (the observed residuals are not iid).  You
> need to think about the source of your data to see if this is a
> reasonable assumption.  Now I cannot fathom any universe (theoretical
> or real) in which normally distributed errors added to means that they
> are independent of will result in a finite set of integers, so an
> assumption of exact normality is not reasonable (some may want to argue
> this, but convincing me will be very difficult).  But looking for exact
> normality is a bit of a red herring because, we also have the Central
> Limit Theorem that says that if the errors are not normal (but still
> iid) then the distribution of the coefficients will approach normality
> as the sample size increases.  This is what make statistics doable
> (because no real dataset entered into the computer is exactly normal).
> The more important question is are the residuals "normal enough"?  for
> which there is not a definitive test (experience and plots help).
> >
&

Re: [R] anova vs aov commands for anova with repeated measures

2011-01-08 Thread Frodo Jedi
Dear Dieter,
thanks a lot for your answer. 
Don´t you mind to do an example of the use of the command?

For example, in attachment you find a .csv table resulting from an experiment, 
you can access it by means of this command:

> scrd<-  
>read.csv(file='/Users/../tables_for_R/table_quality_wood.csv',sep=',',header=T)
>

The data are from an experiment where participants had to evaluate on a seven 
point likert scale
the realism of some stimuli, which are presented both in condition "A" and in 
condition "AH".

I need to perform the ANOVA by means of this command:

> aov1 = aov(response ~ stimulus*condition + 
> Error(subject/(stimulus*condition)), 
>data=scrd)



All the best





From: Dieter Menne 
To: r-help@r-project.org
Sent: Sat, January 8, 2011 3:50:23 PM
Subject: Re: [R] anova vs aov commands for anova with repeated measures



Bill.Venables wrote:
> 
> ...
> If you want to fit a multistratum model, such as a repeated measures
> model, you need to use aov.
> 
> 

...Or lme in nlme / lmer in lme4.

Dieter


-- 
View this message in context: 
http://r.789695.n4.nabble.com/anova-vs-aov-commands-for-anova-with-repeated-measures-tp3179343p3205018.html

Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Anova with repeated measures for unbalanced design

2011-01-08 Thread Frodo Jedi
Dear all,
I need an help because I am really not able to find over internet a good example
in R to analyze an unbalanced table with Anova with repeated measures.
For unbalanced table I mean that the questions are not answered all by the same 
number of subjects.

For a balanced case I would use the command 
aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), 
data=scrd)


Does the same code still work for unbalanced design?
Can anyone provide a R example of code in order to get the same analysis?


Thanks in advance for any suggestion.

Best regards


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova with repeated measures for unbalanced design

2011-01-08 Thread Frodo Jedi
Hi Ben,
many many thanks!!




From: Ben Bolker 
To: r-h...@stat.math.ethz.ch
Sent: Sat, January 8, 2011 9:39:20 PM
Subject: Re: [R] Anova with repeated measures for unbalanced design

Frodo Jedi  yahoo.com> writes:

> 
> Dear all,
> I need an help because I am really not able to find over 
> internet a good example
> in R to analyze an unbalanced table with Anova with 
> repeated measures.
> For unbalanced table I mean that the questions are 
> not answered all by the same 
> number of subjects.
> 
> For a balanced case I would use the command 
>
> aov1 = aov(response ~ stimulus*condition + 
>  Error(subject/(stimulus*condition)), 
> data=scrd)

  I recommend that you find a copy of Pinheiro and Bates 2000 ...


> 
> Does the same code still work for unbalanced design?

  No, it doesn't.


> Can anyone provide a R example of code in order to get the same analysis?

  Something like

library(nlme)
lme1 <- lme(response~stimulus*condition,random=~1|subject,data=scrd)

or possibly

lme1 <- lme(response~stimulus*condition,random=~stimulus*condition|subject,
   data=scrd)

if your experimental design would allow you to estimate stimulus and condition
effects for each subject.

  Further questions along these lines should probably go to
the r-sig-mixed-model mailing list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Post hoc analysis for ANOVA with repeated measures

2011-01-09 Thread Frodo Jedi
Dear all,
how can I perform a post hoc analysis for ANOVA with repeated measures (in 
presence of a balanced design)?
I am not able to find a good example over internet in R...is there among you 
someone so kind to give
me an hint with a R example please?

For example, the aov result of my analysis says that there is a statistical 
difference between stimuli (there are 7 different stimuli).
...I would like to know which stimuli are involved.

> aov1 = aov(response ~ stimulus*condition + 
> Error(subject/(stimulus*condition)), 
>data=scrd)
> summary(aov1)

Error: subject
  Df Sum Sq Mean Sq F value Pr(>F)
Residuals 14 227.57  16.255   

Error: subject:stimulus
  Df Sum Sq Mean Sq F value  Pr(>F)  
stimulus   6 11.695 1.94921  2.3009 0.04179 *
Residuals 84 71.162 0.84717  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Error: subject:condition
  Df Sum Sq Mean Sq F value   Pr(>F)   
condition  1 42.076  42.076  12.403 0.003386 **
Residuals 14 47.495   3.393
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Error: subject:stimulus:condition
   Df Sum Sq Mean Sq F value Pr(>F)
stimulus:condition  6  9.124  1.5206  1.4465 0.2068
Residuals  84 88.305  1.0513   
> 



In attachment you find the table I am using, you can access it by means of:
scrd<-  
read.csv(file='/Users/../tables_for_R/table_quality_gravel.csv',sep=',',header=T)


The data are from an experiment where participants had to evaluate on a seven 
point likert scale
the realism of some stimuli, which are presented both in condition "A" and in 
condition "AH".


Thanks in advance

Best regards


  __
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Anova with repeated measures for unbalanced design

2011-01-09 Thread Frodo Jedi
Dear list members, 
I extensively searched in the previous threads of this mailing list an example 
easy to understand for me, 

and able to fit my problem, but I didn´t succed to find a solution for which I 
feel certain.
At the moment I am stuck with this solution for my unbalanced design, and I 
don´t know if it is correct or not:

library(nlme)
scrd.lme <- lme(response~stimulus*condition,random=~1|subject,data=scrd)

Now at this point is it correct to simply run the command anova(scrd.lme) ? Or 
should I do something different using aov?

As I told, for a balanced case I would use the command 

> aov1 = aov(response ~ stimulus*condition + 
> Error(subject/(stimulus*condition)), 
>data=scrd)


Now in the R prompt I get this output, which is very different from the one 
listed of aov for a balanced case:

> anova(scrd.lme) 
   numDF denDF   F-value p-value
(Intercept)1   182 178.56833  <.0001
stimulus   6   182   1.57851  0.1557
condition  1   182  39.68822  <.0001
stimulus:condition 6   182   0.67992  0.6660


Now, (if the previous point is correct) I am burning with curiosity to solve 
another problem.
I found that for condition there are significatively differences. 
For condition I have only 2 levels so there is no need to do a post-hoc 
analysis.
But if I had 4 levels, I would need.
Now, which is for the ANOVA with repeated measures with UNBALANCED design the 
right
approach for a post hoc test?

Is there anyone who can provide a R code example to solve this problem so I can 
better understand?


I know, I should read some books to understand better the subject. I am doing 
my 
best.


Thanks for any suggestion






From: Ben Bolker 
To: r-h...@stat.math.ethz.ch
Sent: Sat, January 8, 2011 9:39:20 PM
Subject: Re: [R] Anova with repeated measures for unbalanced design

Frodo Jedi  yahoo.com> writes:

> 
> Dear all,
> I need an help because I am really not able to find over 
> internet a good example
> in R to analyze an unbalanced table with Anova with 
> repeated measures.
> For unbalanced table I mean that the questions are 
> not answered all by the same 
> number of subjects.
> 
> For a balanced case I would use the command 
>
> aov1 = aov(response ~ stimulus*condition + 
>  Error(subject/(stimulus*condition)), 
> data=scrd)

  I recommend that you find a copy of Pinheiro and Bates 2000 ...


> 
> Does the same code still work for unbalanced design?

  No, it doesn't.


> Can anyone provide a R example of code in order to get the same analysis?

  Something like

library(nlme)
lme1 <- lme(response~stimulus*condition,random=~1|subject,data=scrd)

or possibly

lme1 <- lme(response~stimulus*condition,random=~stimulus*condition|subject,
   data=scrd)

if your experimental design would allow you to estimate stimulus and condition
effects for each subject.

  Further questions along these lines should probably go to
the r-sig-mixed-model mailing list.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-10 Thread Frodo Jedi
ormal enough"?  for
> which there is not a definitive test (experience and plots help).
> >
> > But this all depends on another assumption that I don't think that
> you have even considered.  Yes we can use normal theory even when the
> random part of the data is not normally distributed, but this still
> assumes that the data is at least interval data, i.e. that we firmly
> believe that the difference between a response of 1 and a response of 2
> is exactly the same as a difference between a 6 and a 7 and that the
> difference from 4 to 6  is exactly twice that of 1 vs. 2.  From your
> data and other descriptions, I don't think that that is a reasonable
> assumption.  If you are not willing to make that assumption (like me)
> then means and normal theory tests are meaningless and you should use
> other approaches.  One possibility is to use non-parametric methods
> (which I believe Frank has already suggested you use), another is to
> use proportional odds logistic regression.
> >
> >
> >
> > --
> > Gregory (Greg) L. Snow Ph.D.
> > Statistical Data Center
> > Intermountain Healthcare
> > greg.s...@imail.org<mailto:greg.s...@imail.org>
> >  801.408.8111
> >
> >
> >> -Original Message-
> >> From: r-help-boun...@r-project.org<mailto:r-help-boun...@r-
> project.org>  [mailto:r-help-boun...@r-
> >> project.org<http://project.org>] On Behalf Of Frodo Jedi
> >> Sent: Wednesday, January 05, 2011 3:22 PM
> >> To: Robert Baer; r-help@r-project.org<mailto:r-help@r-project.org>
> >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> >>  normality
> >>
> >> Dear Robert,
[[elided Yahoo spam]]
> >> So you also think that I have to check only the residuals and not
> the
> >> data
> >> directly.
> >> Now just for curiosity I did the the shapiro test on the residuals.
> The
> >> problem
> >> is that on fit3 I don´t get from the test
> >> that the data are normally distribuited. Why? Here the data:
> >>
> >>> shapiro.test(residuals(fit1))
> >>Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit1)
> >> W = 0.9848, p-value = 0.05693
> >>
> >> #Here the test is ok: the test says that the data are distributed
> >> normally
> >> (p-value greather than 0.05)
> >>
>  >>
> >>
> >>> shapiro.test(residuals(fit2))
> >>Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit2)
> >> W = 0.9853, p-value = 0.06525
> >>
> >> #Here the test is ok: the test says that the data are distributed
> >> normally
> >> (p-value greather than 0.05)
> >>
> >>
> >>
> >>> shapiro.test(residuals(fit3))
> >>Shapiro-Wilk normality test
> >>
> >> data:  residuals(fit3)
> >> W = 0.9621, p-value = 0.0001206
> >>
> >>
> >>
> >> Now the test reveals p-value lower than 0.05: so the residuals for
> fit3
> >> are not
> >> distributed normally
> >> Why I get this beheaviour? Indeed in the  histogram and Q-Q plot for
> >> fit3
> >> residuals I get a normal distribution.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 
> >> From: Robert Baermailto:rb...@atsu.edu>>
> >>
> >> Sent: Wed, January 5, 2011 8:56:50 PM
> >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the
> >> normality
> >>
> >>> Someone suggested me that I don´t  have to check the normality of
> the
> >> data, but
> >>> the normality of the residuals I get after the fitting of the
> linear
> >> model.
> >>> I really ask you to help me to understand this point as I don´t
> find
> >> enough
> >>> material online where to solve it.
> >> Try the following:
> >> # using your scrd data and your proposed models
> >> fit1<- lm(response ~ stimulus + condition + stimulus:condition,
> >> data=scrd)
> >> fit2<- lm(response ~ stimulus + condition, data=scrd)
> >> fit3<- lm(response ~ condition, data=scrd)
> >>
> >> # Set up for 6 plots on 1 panel
> >> op = par(mfrow=c(2,3))
> >>
> >> # residuals function extra

Re: [R] Assumptions for ANOVA: the right way to check the normality

2011-01-11 Thread Frodo Jedi
Many many thanks for your feedback Greg.
You have been very enlightening for me.

Now is time for me to study the material you kindly provided me. Thanks.








From: Greg Snow 

Cc: "r-help@r-project.org" 
Sent: Tue, January 11, 2011 10:13:34 PM
Subject: RE: [R] Assumptions for ANOVA: the right way to check the normality


> Sent: Monday, January 10, 2011 5:44 PM
> To: Greg Snow
> Cc: r-help@r-project.org
> Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality
> 
> Dear Greg,
> first of all thanks for your reply. And I add also many thanks to all of you 
>guys who are helping me, sorry for the amount of questions I recently posted 
>;-) 
>
> 
> I don´t have a solid statistics background (I am not a statician) and I am 
>basically learning everything by myself. 
>
> So my first goal is TO UNDERSTAND. I need to have general guidelines because 
>for my PhD I am doing and I will do several psycophysic experiments.
> I am totally alone in this challenge, so I am asking some help to you guys as 
> I 
>think that here is the best place to exchange the thing that I miss
> and that will never found in any book: the experience.

Isn't there a single statistician anywhere in the University?  Does your 
committee have any experience with any of this?

> >What is the question you are really trying to find the answer for?  Knowing 
>that may help us give more meaningful answers.
> 
> Concerning your question I thought to have been clear. I want to understand 
>which analysis I have to use in order to understand if 
>
> the differences I am having are statistically significant or not. Now, as in 
>all the books I read there is written that to apply ANOVA 
>
> I must respect the assumption of normality then I am try to find a way to
>understand this.

A general run of anova procedures will produce multiple p-values addressing
multiple null hypotheses addressing many different questions (often many of
which are uninteresting).  Which terms are you really trying to test and which 
are included because you already know that they have an effect.

Are you including interactions because you find them actually interesting? Or 
just because that is what everyone else does?

[snip]

> >Also remember that the normality of the data/residuals/etc. is not as 
>important as the CLT for your sample size.  The main things that make the CLT 
>not work (for samples that are >not large enough) are outliers and strong
>skewness, since your outcome is limited to the numbers 1-7, I don’t see 
>outliers 
>or skewness being a real problem.  So you are probably >fine for fixed effects 
>style models (though checking with experts in your area or doing simulations 
>can 
>support/counter this).  
>
> 
> As far as I have seen everyone in my field does ANOVA.

[imagine best Mom voice] and if everyone in your field jumped off a cliff . . .

Do you want to do what everyone else is doing, or something new and different?

What does your committee chair say about this?

> >But when you add in random effects then there is a lot of uncertainty about 
> >if 
>the normal theory still holds, the latest lme code uses mcmc sampling rather 
>than depending on >normal theory and is still being developed.
> 
> 
> For "random effects" do you mean the repeated measures right? So why 
> staticians 
>developed the ANOVA with repeated measure if there is so much uncertainty?

Repeated measures are one type of random effect analysis, but random and mixed 
effects is more general than just repeated measures.

Statisticians developed those methods because they worked for simple cases, 
made 
some sense for more complicated cases, and they did not have anything that was 
both better and practical.  Now with modern computers we can see when those do 
work (unfortunately not as often as had been hoped) and what was once 
impractical is now much simpler (but inertia is to do it the old way, even
though the people who developed the old way would have preferred to do it our 
way).  The article: 


Why Permutation Tests Are Superior to t and F Tests in Biomedical Research
John Ludbrook and Hugh Dudley
The American Statistician
Vol. 52, No. 2 (May, 1998), pp. 127-132

May be enlightening here (and give possible alternatives).

Also see: 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001819.html 

for some simulation involving mixed models.  One shows that the normal theory 
works fine for that particular case, the next one shows a case where the normal 
theory does not work, then shows how to use simulation (parametric bootstrap) 
to 
get a more appropriate p-value.  You can adapt those examples for your own
situation.

>  
> >This now comes back to my first question: what are you trying to find out?
> 
> My ultimate goal is to find the p-values in order to understand if my results 
>are significative or not. So I can write them on the paper ;-)

There is a function in the TeachingDemos package that will produ