[R] Reporting binomial logistic regression from R results
Dear list members, I need some help in understanding whether I am doing correctly a binomial logistic regression and whether I am interpreting the results in the correct way. Also I would need an advice regarding the reporting of the results from the R functions. I want to report the results of a binomial logistic regression where I want to assess difference between the 3 levels of a factor (called System) on the dependent variable (called Response) taking two values, 0 and 1. My goal is to understand if the effect of the 3 systems (A,B,C) in System affect differently Response in a significant way. I am basing my analysis on this URL: https://stats.idre.ucla.edu/r/dae/logit-regression/ This is the result of my analysis: > fit <- glm(Response ~ System, data = scrd, family = "binomial") > summary(fit) Call: glm(formula = Response ~ System, family = "binomial", data = scrd) Deviance Residuals: Min 1Q Median 3Q Max -2.8840 0.1775 0.2712 0.2712 0.5008 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept)3.2844 0.2825 11.626 < 2e-16 *** SystemB -1.2715 0.3379 -3.763 0.000168 *** SystemC0.8588 0.4990 1.721 0.085266 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 411.26 on 1023 degrees of freedom Residual deviance: 376.76 on 1021 degrees of freedom AIC: 382.76 Number of Fisher Scoring iterations: 6 Following this analysis I perform the wald test in order to understand whether there is an overall effect of System: library(aod) > wald.test(b = coef(fit), Sigma = vcov(fit), Terms = 1:3) Wald test: -- Chi-squared test: X2 = 354.6, df = 3, P(> X2) = 0.0 The chi-squared test statistic of 354.6, with 3 degrees of freedom is associated with a p-value < 0.001 indicating that the overall effect of System is statistically significant. Now I check whether there are differences between the coefficients using again the wald test: # Here difference between system B and C: > l <- cbind(0, 1, -1) > wald.test(b = coef(fit), Sigma = vcov(fit), L = l) Wald test: -- Chi-squared test: X2 = 22.3, df = 1, P(> X2) = 2.3e-06 # Here difference between system A and C: > l <- cbind(1, 0, -1) > wald.test(b = coef(fit), Sigma = vcov(fit), L = l) Wald test: -- Chi-squared test: X2 = 12.0, df = 1, P(> X2) = 0.00052 # Here difference between system A and B: > l <- cbind(1, -1, 0) > wald.test(b = coef(fit), Sigma = vcov(fit), L = l) Wald test: -- Chi-squared test: X2 = 58.7, df = 1, P(> X2) = 1.8e-14 My understanding is that from this analysis I can state that the three systems lead to a significantly different Response. Am I right? If so, how should I report the results of this analysis? What is the correct way? Thanks in advance Best wishes FJ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reporting binomial logistic regression from R results
Dear Petr, thank you very much for your feedback. Can anyone in the list advise me if the way I report the results is correct? Kind regards FJ On Mon, Nov 12, 2018 at 1:02 PM PIKAL Petr wrote: > Hi Frodo > > > > I do not consider myself as an arbiter in statistical results and their > presentation. Again your text seems to as good as any other. > > > > You should keep responses to mailing list as others could have another > opinion. > > > > Cheers > > Petr > > > > > > *From:* Frodo Jedi > *Sent:* Monday, November 12, 2018 1:48 PM > *To:* PIKAL Petr > *Subject:* Re: [R] Reporting binomial logistic regression from R results > > > > Dear Petr, > > many thanks for your reply. I was wondering whether in your opinion it is > correct to report in a journal the following text: > > > > > > “A logistic regression was performed to ascertain the effects of the > system type on the likelihood that participants report correct > identifications. The logistic regression model was statistically > significant, χ2(3) = 354.6, p < 0.001, indicating an overall effect of the > system type on participants' identification performances. The Wald test was > used to compare the model coefficients related to the three systems. > Results showed that participants’ accuracy was significantly lower for the > system B compared to both the system C (χ2(1) = 22.3, p < 0.001) and the > system A (χ2(1) = 58.7, p < 0.001), as well as that the system C led to > significantly higher identification accuracies than the system A (χ2(1) = > 12, p < 0.001).” > > > > > > Best wishes > > > > FJ > > > > > > > > > > > > On Mon, Nov 12, 2018 at 10:05 AM PIKAL Petr > wrote: > > Dear Frodo (or Jedi) > > The results seems to confirm your assumption that 3 systems are different. > How you should present results probably depends on how it is usual to > report such results in your environment. > > BTW. It seems to me like homework and this list has no homework policy > (Sorry, if I am mistaken). > > Cheers > Petr > > -Original Message- > > From: R-help On Behalf Of Frodo Jedi > > Sent: Monday, November 12, 2018 2:08 AM > > To: r-help@r-project.org > > Subject: [R] Reporting binomial logistic regression from R results > > > > Dear list members, > > I need some help in understanding whether I am doing correctly a binomial > > logistic regression and whether I am interpreting the results in the > correct way. > > Also I would need an advice regarding the reporting of the results from > the R > > functions. > > > > I want to report the results of a binomial logistic regression where I > want to > > assess difference between the 3 levels of a factor (called System) on the > > dependent variable (called Response) taking two values, 0 and 1. My goal > is to > > understand if the effect of the 3 systems (A,B,C) in System affect > differently > > Response in a significant way. I am basing my analysis on this URL: > > https://stats.idre.ucla.edu/r/dae/logit-regression/ > > > > This is the result of my analysis: > > > > > fit <- glm(Response ~ System, data = scrd, family = "binomial") > > > summary(fit) > > > > Call: > > glm(formula = Response ~ System, family = "binomial", data = scrd) > > > > Deviance Residuals: > > Min 1Q Median 3Q Max > > -2.8840 0.1775 0.2712 0.2712 0.5008 > > > > Coefficients: > > Estimate Std. Error z value Pr(>|z|) > > (Intercept)3.2844 0.2825 11.626 < 2e-16 *** > > SystemB -1.2715 0.3379 -3.763 0.000168 *** > > SystemC0.8588 0.4990 1.721 0.085266 . > > --- > > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > > > (Dispersion parameter for binomial family taken to be 1) > > > > Null deviance: 411.26 on 1023 degrees of freedom Residual deviance: > > 376.76 on 1021 degrees of freedom > > AIC: 382.76 > > > > Number of Fisher Scoring iterations: 6 > > Following this analysis I perform the wald test in order to understand > whether > > there is an overall effect of System: > > > > library(aod) > > > > > wald.test(b = coef(fit), Sigma = vcov(fit), Terms = 1:3) > > Wald test: > > -- > > > > Chi-squared test: > > X2 = 354.6, df = 3, P(> X2) = 0.0 > > The chi-squared test statistic of 354.6, with 3 degrees of freedom is > associated > > with a p-value < 0.001 i
Re: [R] Reporting binomial logistic regression from R results
Dear Peter and Eik, I am very grateful to you for your replies. My current understanding is that from the GLM analysis I can indeed conclude that the response predicted by System A is significantly different from that of System B, while the pairwise comparison A vs C leads to non significance. Now the Wald test seems to be correct only for Systems B vs C, indicating that the pairwise System B vs System C is significant. Am I correct? However, my current understanding is also that I should use contrasts instead of the wald test. So the default contrasts is with the System A, now I should re-perform the GLM with another base. I tried to use the option "contrasts" of the glm: > fit1 <- glm(Response ~ System, data = scrd, family = "binomial", contrasts = contr.treatment(3, base=1,contrasts=TRUE)) > summary(fit1) > fit2 <- glm(Response ~ System, data = scrd, family = "binomial", contrasts = contr.treatment(3, base=2,contrasts=TRUE)) > summary(fit2) > fit3 <- glm(Response ~ System, data = scrd, family = "binomial", contrasts = contr.treatment(3, base=3,contrasts=TRUE)) > summary(fit3) However, the output of these three summary functions are identical. Why? That option should have changed the base, but apparently this is not the case. Another analysis I found online (at this link https://stats.stackexchange.com/questions/60352/comparing-levels-of-factors-after-a-glm-in-r ) to understand the differences between the 3 levels is to use glth with Tuckey. I performed the following: > library(multcomp) > summary(glht(fit, mcp(System="Tukey"))) Simultaneous Tests for General Linear Hypotheses Multiple Comparisons of Means: Tukey Contrasts Fit: glm(formula = Response ~ System, family = "binomial", data = scrd) Linear Hypotheses: Estimate Std. Error z value Pr(>|z|) B - A == 0 -1.2715 0.3379 -3.763 0.000445 *** C - A == 00.8588 0.4990 1.721 0.192472 C - B == 0 2.1303 0.4512 4.722 < 1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Adjusted p values reported -- single-step method) Is this Tukey analysis correct? I am a bit confused on what analysis I should do. I am doing my very best to study all resources I can find, but I would really need some help from experts, especially in using R. Best wishes FJ On Mon, Nov 12, 2018 at 1:46 PM peter dalgaard wrote: > Yes, only one of the pairwise comparisons (B vs. C) is right. Also, the > overall test has 3 degrees of freedom whereas a comparison of 3 groups > should have 2. You (meaning Frodo) are testing that _all 3_ regression > coefficients are zero, intercept included. That would imply that all three > systems have response probablilities og 0.5, which is not likely what you > want. > > This all suggests that you are struggling with the interpretation of the > regression coefficients and their role in the linear predictor. This should > be covered by any good book on logistic regression. > > -pd > > > On 12 Nov 2018, at 14:15 , Eik Vettorazzi wrote: > > > > Dear Jedi, > > please use the source carefully. A and C are not statistically different > at the 5% level, which can be inferred from glm output. Your last two > wald.tests don't test what you want to, since your model contains an > intercept term. You specified contrasts which tests A vs B-A, ie A- > (B-A)==0 <-> 2*A-B==0 which is not intended I think. Have a look at > ?contr.treatment and re-read your source doc to get an idea what dummy > coding and indicatr variables are about. > > > > Cheers > > > > > > Am 12.11.2018 um 02:07 schrieb Frodo Jedi: > >> Dear list members, > >> I need some help in understanding whether I am doing correctly a > binomial > >> logistic regression and whether I am interpreting the results in the > >> correct way. Also I would need an advice regarding the reporting of the > >> results from the R functions. > >> I want to report the results of a binomial logistic regression where I > want > >> to assess difference between the 3 levels of a factor (called System) on > >> the dependent variable (called Response) taking two values, 0 and 1. My > >> goal is to understand if the effect of the 3 systems (A,B,C) in System > >> affect differently Response in a significant way. I am basing my > analysis > >> on this URL: https://stats.idre.ucla.edu/r/dae/logit-regression/ > >> This is the result of my analysis: > >>> fit <- glm(Response ~ System, data = scrd, family = "binomial") > >>> summary(fit) > >> Call: > >> glm(formula = Response ~ System, family = "binomial", data = scrd) > >> Deviance
Re: [R] Reporting binomial logistic regression from R results
Dear Bert, I understand and thanks for your recommendation. Unfortunately I do not have any possibility to contact a statistical expert at the moment. So this forum experts' recommendation would be crucial to me to understand how R works in relation to my question. I hope that someone could reply to my last questions. Best regards FJ On Mon, Nov 12, 2018 at 7:48 PM Bert Gunter wrote: > Generally speaking, this list is about questions on R programming, not > statistical issues. However, I grant you that your queries are in something > of a gray area intersecting both. > > Nevertheless, based on your admitted confusion, I would recommend that you > find a local statistical expert with whom you can consult 1-1 if at all > possible. As others have already noted, you statistical understanding is > muddy, and it can be quite difficult to resolve such confusion in online > forums like this that cannot provide the close back and forth that may be > required (as well as further appropriate study). > > Best, > Bert > > On Mon, Nov 12, 2018 at 11:09 AM Frodo Jedi < > frodojedi.mailingl...@gmail.com> wrote: > >> Dear Peter and Eik, >> I am very grateful to you for your replies. >> My current understanding is that from the GLM analysis I can indeed >> conclude that the response predicted by System A is significantly >> different >> from that of System B, while the pairwise comparison A vs C leads to non >> significance. Now the Wald test seems to be correct only for Systems B vs >> C, indicating that the pairwise System B vs System C is significant. Am I >> correct? >> >> However, my current understanding is also that I should use contrasts >> instead of the wald test. So the default contrasts is with the System A, >> now I should re-perform the GLM with another base. I tried to use the >> option "contrasts" of the glm: >> >> > fit1 <- glm(Response ~ System, data = scrd, family = "binomial", >> contrasts = contr.treatment(3, base=1,contrasts=TRUE)) >> > summary(fit1) >> >> > fit2 <- glm(Response ~ System, data = scrd, family = "binomial", >> contrasts = contr.treatment(3, base=2,contrasts=TRUE)) >> > summary(fit2) >> >> > fit3 <- glm(Response ~ System, data = scrd, family = "binomial", >> contrasts = contr.treatment(3, base=3,contrasts=TRUE)) >> > summary(fit3) >> >> However, the output of these three summary functions are identical. Why? >> That option should have changed the base, but apparently this is not the >> case. >> >> >> Another analysis I found online (at this link >> >> https://stats.stackexchange.com/questions/60352/comparing-levels-of-factors-after-a-glm-in-r >> ) >> to understand the differences between the 3 levels is to use glth with >> Tuckey. I performed the following: >> >> > library(multcomp) >> > summary(glht(fit, mcp(System="Tukey"))) >> >> Simultaneous Tests for General Linear Hypotheses >> >> Multiple Comparisons of Means: Tukey Contrasts >> >> >> Fit: glm(formula = Response ~ System, family = "binomial", data = scrd) >> >> Linear Hypotheses: >> Estimate Std. Error z value Pr(>|z|) >> B - A == 0 -1.2715 0.3379 -3.763 0.000445 *** >> C - A == 00.8588 0.4990 1.721 0.192472 >> C - B == 0 2.1303 0.4512 4.722 < 1e-04 *** >> --- >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> (Adjusted p values reported -- single-step method) >> >> >> Is this Tukey analysis correct? >> >> >> I am a bit confused on what analysis I should do. I am doing my very best >> to study all resources I can find, but I would really need some help from >> experts, especially in using R. >> >> >> Best wishes >> >> FJ >> >> >> >> >> >> >> On Mon, Nov 12, 2018 at 1:46 PM peter dalgaard wrote: >> >> > Yes, only one of the pairwise comparisons (B vs. C) is right. Also, the >> > overall test has 3 degrees of freedom whereas a comparison of 3 groups >> > should have 2. You (meaning Frodo) are testing that _all 3_ regression >> > coefficients are zero, intercept included. That would imply that all >> three >> > systems have response probablilities og 0.5, which is not likely what >> you >> > want. >> > >> > This all suggests that you are struggling with the interpretation of the >> > regression coefficients and their role in the linear predictor
[R] t-test or ANOVA...who wins? Help please!
Dear all, I need an help because I don´t know how to perform the analysis in the right way, as I get different beheaviors using t-test and two ways ANOVA. In what follow I post the table, my goal and the strange results I got. I kindly ask you an help because I really don´t know how to solve this problem. So the table is this: number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH7 26flat_550_W_realismAH2 27flat_550_W_realismAH7 28flat_550_W_realismAH5 29flat_550_W_realismAH5 30 bump_2_step_W_realism A1 31 bump_2_step_W_realism A3 32 bump_2_step_W_realism A5 33 bump_2_step_W_realism A1 34 bump_2_step_W_realism A3 35 bump_2_step_W_realism A2 36 bump_2_step_W_realism A5 37 bump_2_step_W_realism A4 38 bump_2_step_W_realism A4 39 bump_2_step_W_realism A4 40 bump_2_step_W_realism A4 41 bump_2_step_W_realismAH3 42 bump_2_step_W_realismAH5 43 bump_2_step_W_realismAH1 44 bump_2_step_W_realismAH5 45 bump_2_step_W_realismAH4 46 bump_2_step_W_realismAH4 47 bump_2_step_W_realismAH5 48 bump_2_step_W_realismAH4 49 bump_2_step_W_realismAH3 50 bump_2_step_W_realismAH4 51 bump_2_step_W_realismAH5 52 bump_2_step_W_realismAH4 53 hole_2_step_W_realism A3 54 hole_2_step_W_realism A3 55 hole_2_step_W_realism A4 56 hole_2_step_W_realism A1 57 hole_2_step_W_realism A4 58 hole_2_step_W_realism A3 59 hole_2_step_W_realism A5 60 hole_2_step_W_realism A4 61 hole_2_step_W_realism A3 62 hole_2_step_W_realism A4 63 hole_2_step_W_realism A7 64 hole_2_step_W_realism A5 65 hole_2_step_W_realism A1 66 hole_2_step_W_realism A4 67 hole_2_step_W_realismAH7 68 hole_2_step_W_realismAH5 69 hole_2_step_W_realismAH5 70 hole_2_step_W_realismAH1 71 hole_2_step_W_realismAH5 72 hole_2_step_W_realismAH5 73 hole_2_step_W_realismAH5 74 hole_2_step_W_realismAH2 75 hole_2_step_W_realismAH6 76 hole_2_step_W_realismAH5 77 hole_2_step_W_realismAH5 78 hole_2_step_W_realismAH6 79 bump_2_heel_toe_W_realism A3 80 bump_2_heel_toe_W_realism A3 81 bump_2_heel_toe_W_realism A3 82 bump_2_heel_toe_W_realism A2 83 bump_2_heel_toe_W_realism A3 84 bump_2_heel_toe_W_realism A3 85 bump_2_heel_toe_W_realism A4 86 bump_2_heel_toe_W_realism A3 87 bump_2_heel_toe_W_realism A4 88 bump_2_heel_toe_W_realism A4 89 bump_2_heel_toe_W_realism
Re: [R] t-test or ANOVA...who wins? Help please!
Dear Tal Galili, thanks a lot for your answer! I agree with you, the t-test is comparing 2 conditions at one level of stimulus, while the ANOVA table is testing the significance of the interaction between condition and stimulsthe two tests are testing two different things. But still I don´t understand which is the right way to perform the analysis in order to solve my problem. Let´s consider now only the table I posted before. The same stimuli in the table have been presented to subjects in two conditions: A and AH, where AH is the condition A plus something elese (let´s call it "H"). I want to know if AT GLOBAL LEVEL adding "H" bring to better results in the participants evaluations of the stimuli rather than the stimulus presented only with condition "A". Data in column "response" are evaluation on realism of the stimulus from a 7 point scale. If I calculate the mean for each stmulus in each condition, the results show that for each stimulus the AH condition is always greater than the first. Anyway, doing a t-test to compare the stimuli by couple (es. flat_550_W_realism in condition A, flat_550_W_realism in condition AH) I get that only sometimes the differences are statistically significant. I ask you if there is a way to say that condition AH is better than condition A, at global level. In attachment you find the table in .txt and also in .csv format. Is it possible for you to make an example in R, including also the R results in order to tell me what to see in the console to see if my problem is solved or not? For example, I was checking in the anova results the stimulus:conditon line.but I don´t know if my anova analysis was correct or not. I am not an expert of R, nor of statistics ;-( Anyway I am doing my best to study and understand. Please enlighten me. Thanks in advance Best regards ________ From: Tal Galili To: Frodo Jedi Cc: r-help@r-project.org Sent: Wed, January 5, 2011 10:15:41 AM Subject: Re: [R] t-test or ANOVA...who wins? Help please! Hello Frodo, It is not clear to me from your questions some of the basics of your analysis. If you only have two levels of a factor, and one response - why in the anova do you use more factors (and their interactions)? In that sense, it is obvious that your results would differ from the t-test. In either case, I am not sure if any of these methods are valid since your data doesn't seem to be normal. Here is an example code of how to get the same results from aov and t.test. And also a nonparametric option (that might be more fitting) flat_550_W_realism =c(3,3,5,3,3,3,3,5,3,3,5,7,5,2,3) flat_550_W_realism_AH =c(7,4,5,3,6,5,3,5,5,7,2,7,5, 5) x <- c(rep(1, length(flat_550_W_realism)), rep(2, length(flat_550_W_realism_AH))) y <- c(flat_550_W_realism , flat_550_W_realism_AH) # equal results between t test and anova t.test(y ~ x, var.equal= T) summary(aov(y ~ x)) # plotting the data: boxplot(y ~ x) # group 1 is not at all symetrical... wilcox.test(y ~ x) # a more fitting test Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ------ On Wed, Jan 5, 2011 at 12:37 AM, Frodo Jedi wrote: >I kindly ask you an help because I really don´t know how to solve this problem. > number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH
[R] Assumptions for ANOVA: the right way to check the normality
Dear all, I would like to know which is the right way to check the normality assumption for performing ANOVA. How do you check normality for the following example? I did an experiment where people had to evaluate on a 7 point scale, the degree of realism of some stimuli presented in 2 conditions. The problem is that if I check normality with the Shapiro test I get that the data are not normally distributed. Someone suggested me that I don´t have to check the normality of the data, but the normality of the residuals I get after the fitting of the linear model. I really ask you to help me to understand this point as I don´t find enough material online where to solve it. If the data are not normally distributed I have to use the kruskal wallys test and not the ANOVA...so please help me to understand. I make a numerical example, could you please tell me if the data in this table are normally distributed or not? Help! number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH7 26flat_550_W_realismAH2 27flat_550_W_realismAH7 28flat_550_W_realismAH5 29flat_550_W_realismAH5 30 bump_2_step_W_realism A1 31 bump_2_step_W_realism A3 32 bump_2_step_W_realism A5 33 bump_2_step_W_realism A1 34 bump_2_step_W_realism A3 35 bump_2_step_W_realism A2 36 bump_2_step_W_realism A5 37 bump_2_step_W_realism A4 38 bump_2_step_W_realism A4 39 bump_2_step_W_realism A4 40 bump_2_step_W_realism A4 41 bump_2_step_W_realismAH3 42 bump_2_step_W_realismAH5 43 bump_2_step_W_realismAH1 44 bump_2_step_W_realismAH5 45 bump_2_step_W_realismAH4 46 bump_2_step_W_realismAH4 47 bump_2_step_W_realismAH5 48 bump_2_step_W_realismAH4 49 bump_2_step_W_realismAH3 50 bump_2_step_W_realismAH4 51 bump_2_step_W_realismAH5 52 bump_2_step_W_realismAH4 53 hole_2_step_W_realism A3 54 hole_2_step_W_realism A3 55 hole_2_step_W_realism A4 56 hole_2_step_W_realism A1 57 hole_2_step_W_realism A4 58 hole_2_step_W_realism A3 59 hole_2_step_W_realism A5 60 hole_2_step_W_realism A4 61 hole_2_step_W_realism A3 62 hole_2_step_W_realism A4 63 hole_2_step_W_realism A7 64 hole_2_step_W_realism A5 65 hole_2_step_W_realism A1 66 hole_2_step_W_realism A4 67 hole_2_step_W_realismAH7 68 hole_2_step_W_realismAH5 69 hole_2_step_W_realismAH5 70 hole_2_step_W_realismAH1 71 hole_2_step_W_realismAH5 72 hole_2_step_W_realismAH5 73 hole_2_step_W_realismAH5 74 hole_2_step_W_realismAH2 75 hole_2_step_W_realismAH6 76 hole_2_step_W_realismAH5 77 hole_2_step_W_realismAH
[R] Comparing fitting models
Dear all, I have 3 models (from simple to complex) and I want to compare them in order to see if they fit equally well or not. From the R prompt I am not able to see where I can get this information. Let´s do an example: fit1<- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) #EQUIVALE A lm(response ~ stimulus*condition, data=scrd) fit2<- lm(response ~ stimulus + condition, data=scrd) fit3<- lm(response ~ condition, data=scrd) > anova(fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ stimulus + condition Model 2: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(>F) 1165 364.13 2159 362.67 61.4650 0.1071 0.9955 > anova(fit3, fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ condition Model 2: response ~ stimulus + condition Model 3: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(>F) 1171 382.78 2165 364.13 618.650 1.3628 0.2328 3159 362.67 6 1.465 0.1071 0.9955 How can I understand that the simple model fits as good as the complex model (the one with the interaction)? Thanks in advance All the best [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Comparing fitting models
Dear Greg, thanks for your answer, but that is not the point. My question is another: from the data in the R prompt how can I compare the models? I don´t understand the output of anova(fit1, fit2, fit3) What I have to look to understand something about the comparison of the models? Look the output in R: > anova(fit3, fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ condition Model 2: response ~ stimulus + condition Model 3: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(>F) 1171 382.78 2165 364.13 618.650 1.3628 0.2328 3159 362.67 6 1.465 0.1071 0.9955 I don´t understand these data...what they say actually about the comparison between the three models? Please enlighten me. Thanks a lot From: Greg Snow Sent: Wed, January 5, 2011 10:34:15 PM Subject: RE: [R] Comparing fitting models Just do anova(fit3, fit1) This compares those 2 models directly. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Frodo Jedi > Sent: Wednesday, January 05, 2011 10:10 AM > To: r-help@r-project.org > Subject: [R] Comparing fitting models > > > Dear all, > I have 3 models (from simple to complex) and I want to compare them in > order to > see if they fit equally well or not. > From the R prompt I am not able to see where I can get this > information. > Let´s do an example: > > fit1<- lm(response ~ stimulus + condition + stimulus:condition, > data=scrd) > #EQUIVALE A lm(response ~ stimulus*condition, data=scrd) > > > fit2<- lm(response ~ stimulus + condition, data=scrd) > > fit3<- lm(response ~ condition, data=scrd) > > > > anova(fit2, fit1) #compare models > Analysis of Variance Table > > Model 1: response ~ stimulus + condition > Model 2: response ~ stimulus + condition + stimulus:condition > Res.DfRSS Df Sum of Sq F Pr(>F) > 1165 364.13 > 2159 362.67 61.4650 0.1071 0.9955 > > > > anova(fit3, fit2, fit1) #compare models > Analysis of Variance Table > > Model 1: response ~ condition > Model 2: response ~ stimulus + condition > Model 3: response ~ stimulus + condition + stimulus:condition > Res.DfRSS Df Sum of Sq F Pr(>F) > 1171 382.78 > 2165 364.13 618.650 1.3628 0.2328 > 3159 362.67 6 1.465 0.1071 0.9955 > > > > How can I understand that the simple model fits as good as the complex > model > (the one with the interaction)? > > Thanks in advance > > All the best > > > > [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assumptions for ANOVA: the right way to check the normality
Dear Robert, thanks so much!!! Now I understand! So you also think that I have to check only the residuals and not the data directly. Now just for curiosity I did the the shapiro test on the residuals. The problem is that on fit3 I don´t get from the test that the data are normally distribuited. Why? Here the data: > shapiro.test(residuals(fit1)) Shapiro-Wilk normality test data: residuals(fit1) W = 0.9848, p-value = 0.05693 #Here the test is ok: the test says that the data are distributed normally (p-value greather than 0.05) > shapiro.test(residuals(fit2)) Shapiro-Wilk normality test data: residuals(fit2) W = 0.9853, p-value = 0.06525 #Here the test is ok: the test says that the data are distributed normally (p-value greather than 0.05) > shapiro.test(residuals(fit3)) Shapiro-Wilk normality test data: residuals(fit3) W = 0.9621, p-value = 0.0001206 Now the test reveals p-value lower than 0.05: so the residuals for fit3 are not distributed normally Why I get this beheaviour? Indeed in the histogram and Q-Q plot for fit3 residuals I get a normal distribution. From: Robert Baer Sent: Wed, January 5, 2011 8:56:50 PM Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality > Someone suggested me that I don´t have to check the normality of the data, but > the normality of the residuals I get after the fitting of the linear model. > I really ask you to help me to understand this point as I don´t find enough > material online where to solve it. Try the following: # using your scrd data and your proposed models fit1<- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) fit2<- lm(response ~ stimulus + condition, data=scrd) fit3<- lm(response ~ condition, data=scrd) # Set up for 6 plots on 1 panel op = par(mfrow=c(2,3)) # residuals function extracts residuals # Visual inspection is a good start for checking normality # You get a much better feel than from some "magic number" statistic hist(residuals(fit1)) hist(residuals(fit2)) hist(residuals(fit3)) # especially qqnorm() plots which are linear for normal data qqnorm(residuals(fit1)) qqnorm(residuals(fit2)) qqnorm(residuals(fit3)) # Restore plot parameters par(op) > > If the data are not normally distributed I have to use the kruskal wallys test > and not the ANOVA...so please help > me to understand. Indeed - Kruskal-Wallis is a good test to use for one factor data that is ordinal so it is a good alternative to your fit3. Your "response" seems to be a discrete variable rather than a continuous variable. You must decide if it is reasonable to approximate it with a normal distribution which is by definition continuous. > > I make a numerical example, could you please tell me if the data in this table > are normally distributed or not? > > Help! > > > number stimulus condition response > 1 flat_550_W_realism A3 > 2 flat_550_W_realism A3 > 3 flat_550_W_realism A5 > 4 flat_550_W_realism A3 > 5 flat_550_W_realism A3 > 6 flat_550_W_realism A3 > 7 flat_550_W_realism A3 > 8 flat_550_W_realism A5 > 9 flat_550_W_realism A3 > 10flat_550_W_realism A3 > 11flat_550_W_realism A5 > 12flat_550_W_realism A7 > 13flat_550_W_realism A5 > 14flat_550_W_realism A2 > 15flat_550_W_realism A3 > 16flat_550_W_realismAH7 > 17flat_550_W_realismAH4 > 18flat_550_W_realismAH5 > 19flat_550_W_realismAH3 > 20flat_550_W_realismAH6 > 21flat_550_W_realismAH5 > 22flat_550_W_realismAH3 > 23flat_550_W_realismAH5 > 24flat_550_W_realismAH5 > 25flat_550_W_realismAH7 > 26flat_550_W_realismAH2 > 27flat_550_W_realismAH7 > 28flat_550_W_realismAH5 > 29flat_550_W_realismAH5 > 30 bump_2_step_W_realism A1 > 31 bump_2_step_W_realism A3 > 32 bump_2_step_W_realism A5 > 33 bump_2_step_W_realism A1 > 34 bump_2_step_W_realism A3 > 35 bump_2_step_W_realism A2 > 36 bump_2_step_W_realism A5 > 37 bump_2_step_W_realism A4 > 38 bump_2_step_W_realism A4 > 3
Re: [R] Comparing fitting models
Dear Greg, thanks so much. Now I understand!! Thanks!! From: Greg Snow Sent: Wed, January 5, 2011 11:27:53 PM Subject: RE: [R] Comparing fitting models The output with all three fits gives you 2 comparisons, fit1 vs. fit2 and fit2 vs. fit3. So using an alpha of 0.05, the 0.99 p-value is comparing model 2 (fit2) and model 3 (fit1) and testing the null that they fit equally well with the differences being due to random chance. The p-value is large so we cannot reject that fit2 fits as well as fit1, so the interaction is not significantly different. Then the 0.23 p-value compares the models with and without stimulus (the interaction not being considered), again the p-value is larger than alpha so we donât have the evidence to conclude that stimulus helps. If you want to test stimulus and the interaction in a single step, then do anova(fit3, fit1), giving the anova does the sequential tests. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 Sent: Wednesday, January 05, 2011 2:46 PM To: Greg Snow; r-help@r-project.org Subject: Re: [R] Comparing fitting models Dear Greg, thanks for your answer, but that is not the point. My question is another: from the data in the R prompt how can I compare the models? I don´t understand the output of anova(fit1, fit2, fit3) What I have to look to understand something about the comparison of the models? Look the output in R: > anova(fit3, fit2, fit1) #compare models Analysis of Variance Table Model 1: response ~ condition Model 2: response ~ stimulus + condition Model 3: response ~ stimulus + condition + stimulus:condition Res.DfRSS Df Sum of Sq F Pr(>F) 1171 382.78 2165 364.13 618.650 1.3628 0.2328 3159 362.67 6 1.465 0.1071 0.9955 I don´t understand these data...what they say actually about the comparison between the three models? Please enlighten me. Thanks a lot From:Greg Snow Sent: Wed, January 5, 2011 10:34:15 PM Subject: RE: [R] Comparing fitting models Just do anova(fit3, fit1) This compares those 2 models directly. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Frodo Jedi > Sent: Wednesday, January 05, 2011 10:10 AM > To: r-help@r-project.org > Subject: [R] Comparing fitting models > > > Dear all, > I have 3 models (from simple to complex) and I want to compare them in > order to > see if they fit equally well or not. > From the R prompt I am not able to see where I can get this > information. > Let´s do an example: > > fit1<- lm(response ~ stimulus + condition + stimulus:condition, > data=scrd) > #EQUIVALE A lm(response ~ stimulus*condition, data=scrd) > > > fit2<- lm(response ~ stimulus + condition, data=scrd) > > fit3<- lm(response ~ condition, data=scrd) > > > > anova(fit2, fit1) #compare models > Analysis of Variance Table > > Model 1: response ~ stimulus + condition > Model 2: response ~ stimulus + condition + stimulus:condition > Res.DfRSS Df Sum of Sq F Pr(>F) > 1165 364.13 > 2159 362.67 61.4650 0.1071 0.9955 > > > > anova(fit3, fit2, fit1) #compare models > Analysis of Variance Table > > Model 1: response ~ condition > Model 2: response ~ stimulus + condition > Model 3: response ~ stimulus + condition + stimulus:condition > Res.DfRSS Df Sum of Sq F Pr(>F) > 1171 382.78 > 2165 364.13 618.650 1.3628 0.2328 > 3159 362.67 61.465 0.1071 0.9955 > > > > How can I understand that the simple model fits as good as the complex > model > (the one with the interaction)? > > Thanks in advance > > All the best > > > > [[alternative HTML version deleted]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with 2-ways ANOVA interactions
Dear All, I have a problem in understanding how the interactions of 2 ways ANOVA work, because I get conflicting results from a t-test and an anova. For most of you my problem is very simple I am sure. I need an help with an example, looking at one table I am analyzing. The table is in attachment and can be imported in R by means of this command: scrd<- read.table('/Users/luca/Documents/Analisi_passi/Codice_R/Statistics_results_bump_hole_Audio_Haptic/tables_for_R/table_realism_wood.txt', header=TRUE, colClasse=c('numeric','factor','factor','numeric')) This table is the result of a simple experiment. Subjects where exposed to some stimuli and they where asked to evaluate the degree of realism of the stimuli on a 7 point scale (i.e., data in column "response"). Each stimulus was presented in two conditions, "A" and "AH", where AH is the condition A plus another thing (let´s call it "H"). Now, what means exactly in my table the interaction stimulus:condition? I think that if I do the analysis anova(response ~ stimulus*condition) I will get the comparison between the same stimulus in condition A and in condition AH. Am I wrong? For instance the comparison of stimulus flat_550_W_realism presented in condition A with the same stimulus, flat_550_W_realism, presented in condition AH. The problem is that if I do a t-test between the values of this stimulus in the A and AH condition I get significative difference, while if I do the test with 2-ways ANOVA I don´t get any difference. How is this possible? Here I put the results analysis #Here the result of ANOVA: > fit1<- lm(response ~ stimulus + condition + stimulus:condition, data=scrd) >#EQUIVALE A lm(response ~ stimulus*condition, data=scrd) > > anova(fit1) Analysis of Variance Table Response: response Df Sum Sq Mean Sq F value Pr(>F) stimulus 6 15.05 2.509 1.1000 0.3647 condition1 36.51 36.515 16.0089 9.64e-05 *** stimulus:condition 6 1.47 0.244 0.1071 0.9955 Residuals 159 362.67 2.281 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #As you can see the p-value for stimulus:condition is high. #Now I do the t-test with the same values of the table concerning the stimulus presented in A and AH conditions: flat_550_W_realism =c(3,3,5,3,3,3,3,5,3,3,5,7,5,2,3) flat_550_W_realism_AH =c(7,4,5,3,6,5,3,5,5,7,2,7,5, 5) > t.test(flat_550_W_realism,flat_550_W_realism_AH, var.equal=TRUE) Two Sample t-test data: flat_550_W_realism and flat_550_W_realism_AH t = -2.2361, df = 27, p-value = 0.03381 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -2.29198603 -0.09849016 sample estimates: mean of x mean of y 3.73 4.928571 #Now we have a significative difference between these two stimuli (p-value = 0.03381) Why I get this beheaviour? Moreover, how by means of ANOVA I could track the significative differences between the stimuli presented in A and AH condition whitout doing the t-test? Please help! Thanks in advance number stimulus condition response 1 flat_550_W_realism A3 2 flat_550_W_realism A3 3 flat_550_W_realism A5 4 flat_550_W_realism A3 5 flat_550_W_realism A3 6 flat_550_W_realism A3 7 flat_550_W_realism A3 8 flat_550_W_realism A5 9 flat_550_W_realism A3 10flat_550_W_realism A3 11flat_550_W_realism A5 12flat_550_W_realism A7 13flat_550_W_realism A5 14flat_550_W_realism A2 15flat_550_W_realism A3 16flat_550_W_realismAH7 17flat_550_W_realismAH4 18flat_550_W_realismAH5 19flat_550_W_realismAH3 20flat_550_W_realismAH6 21flat_550_W_realismAH5 22flat_550_W_realismAH3 23flat_550_W_realismAH5 24flat_550_W_realismAH5 25flat_550_W_realismAH7 26flat_550_W_realismAH2 27flat_550_W_realismAH7 28flat_550_W_realismAH5 29flat_550_W_realismAH5 30 bump_2_step_W_realism A1 31 bump_2_step_W_realism A3 32 bump_2_step_W_realism A5 33 bump_2_step_W_realism A1 34 bump_2_step_W_realism
Re: [R] Problem with 2-ways ANOVA interactions
Dear Greg, thanks so much, I think that now I have understood. Please confirm me this reading what follows ;-) To summarize from the beginning, the table I analyzed is the result of a simple experiment. Subjects where exposed to some stimuli and they where asked to evaluate the degree of realism of the stimuli on a 7 point scale (i.e., data in column "response"). Each stimulus was presented in two conditions, "A" and "AH", where AH is the condition A plus another thing (let´s call it "H"). Before I wrongly thought that if I do the analysis anova(response ~ stimulus*condition) I would have got the comparison between the same stimulus in condition A and in condition AH (e.g. stimulus_1_A, stimulus_1_AH). Instad, apparently, the interaction stimulus:condition means that I find the differences between the stimuli keeping fixed the condition!! If this is true then doing the anova with the interaction stimulus:condition is equivalent to do the ONE WAY ANOVA first on the subset where all the conditions are A and then on the subset where all the conditions are AH? Right? So if all before is correct, my final question is: how by means of ANOVA can I track the significative differences between the stimuli presented in A and AH condition whitout passing for the t-test? Indeed my goal was to find in one hand if globally the condition AH bring to better results than condition A, and on the other hand I needed to know for which stimuli the condition AH brings better results than condition A. Finally, Iam burning with curiosity to know the answers to the following two questions: 1- is there a difference between anova(response ~ stimulus*condition) and anova(response ~ condition*stimulus) concerning the interaction part? 2- doing the anova(response ~ stimulus + condition) give the same results of two ONE WAY ANOVA anova(response ~ stimulus) and anova(response ~ condition) but the advantage is that they are presented together in one single output? Looking forward to knowing your response! Best regards From: Greg Snow Sent: Thu, January 6, 2011 6:41:49 PM Subject: RE: [R] Problem with 2-ways ANOVA interactions You really need to spend more time with a good aov textbook and probably a consultant that can explain things to you face to face. But here is a basic explanation to get you pointed in the right direction: Consider a simple 2x2 example with factors A and B each with 2 levels (1 and 2). Draw a 2x2 grid to represent this, there are 4 groups and the theory would be that they have means mu11, mu12, mu21, and mu22 (mu12 is for the group with A at level 1 and B at level 2, etc.). Now you fit the full model with 2 main effects and 1 interaction, if we assume treatment contrasts (the default in R, the coefficients/tests will be different for different contrasts, but the general idea is the same) then the intercept/mean/constant piece will correspond to mu11; the coefficient (only seen if treated as lm instead of aov object) for testing A will be (mu21-mu11) and for testing B will be (mu12-m11). Now the interaction piece gets a bit more complex, it is (mu11 - mu12 - mu21 + mu22), this makes a bit more sense if we rearrange it to be one of ( (mu22-mu21) - (mu12-mu11) ) or ( (mu22-mu12) - (mu21-mu11) ); it represents the difference in the differences, i.e. we find how much going from A1 to A2 changes things when B is 1, then we find how much going from A1 to A2 changes things when B is 2, then we find the difference in these changes, that is the interaction (and if it is 0, then the effects of A and B are additive and independent, i.e. the amount A changes things does not depend on the value of B and vis versa). So testing the interaction term is asking if how much a change in A affects things depends on the value of B. This is very different from comparing mu11 to mu12 (or mu21 to mu22) which is what I think you did in the t-test, it is asking a very different question and using different base assumptions (ignoring any effect of B, additional data, etc.). Note that your test on condition is very significant, this would be more similar to your t-test, but still not match exactly because of the differences. Now your case is more complicated since stimulus has 7 levels (6 df), so the interaction is a combination of 6 different differences of differences, which is why you need to spend some time in a good textbook/class to really understand what model(s) you are fitting. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Frodo Jedi > Sent: Wednesday, January 05, 2011 4:10 PM > To: r-help@r-project.org > Subject: [R] Problem with 2-ways ANOVA interactions
Re: [R] Assumptions for ANOVA: the right way to check the normality
Ok, I see ;-) Let´s put in this way then. When do I have to use the kruskal wallis test? I mean, when I am very sure that I have to use it instead of ANOVA? Thanks Best regards P.S. In addition, which is the non parametric methods corresponding to a 2 ways anova?..or have I to repeat many times the kruskal wallis test? From: Greg Snow "r-help@r-project.org" Sent: Thu, January 6, 2011 7:07:17 PM Subject: RE: [R] Assumptions for ANOVA: the right way to check the normality Remember that an non-significant result (especially one that is still near alpha like yours) does not give evidence that the null is true. The reason that the 1st 2 tests below don't show significance is more due to lack of power than some of the residuals being normal. The only test that I would trust for this is SnowsPenultimateNormalityTest (TeachingDemos package, the help page is more useful than the function itself). But I think that you are mixing up 2 different concepts (a very common misunderstanding). What is important if we want to do normal theory inference is that the coefficients/effects/estimates are normally distributed. Now since these coefficients can be shown to be linear combinations of the error terms, if the errors are iid normal then the coefficients are also normally distributed. So many people want to show that the residuals come from a perfectly normal distribution. But it is the theoretical errors, not the observed residuals that are important (the observed residuals are not iid). You need to think about the source of your data to see if this is a reasonable assumption. Now I cannot fathom any universe (theoretical or real) in which normally distributed errors added to means that they are independent of will result in a finite set of integers, so an assumption of exact normality is not reasonable (some may want to argue this, but convincing me will be very difficult). But looking for exact normality is a bit of a red herring because, we also have the Central Limit Theorem that says that if the errors are not normal (but still iid) then the distribution of the coefficients will approach normality as the sample size increases. This is what make statistics doable (because no real dataset entered into the computer is exactly normal). The more important question is are the residuals "normal enough"? for which there is not a definitive test (experience and plots help). But this all depends on another assumption that I don't think that you have even considered. Yes we can use normal theory even when the random part of the data is not normally distributed, but this still assumes that the data is at least interval data, i.e. that we firmly believe that the difference between a response of 1 and a response of 2 is exactly the same as a difference between a 6 and a 7 and that the difference from 4 to 6 is exactly twice that of 1 vs. 2. >From your data and other descriptions, I don't think that that is a reasonable assumption. If you are not willing to make that assumption (like me) then means and normal theory tests are meaningless and you should use other approaches. One possibility is to use non-parametric methods (which I believe Frank has already suggested you use), another is to use proportional odds logistic regression. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Frodo Jedi > Sent: Wednesday, January 05, 2011 3:22 PM > To: Robert Baer; r-help@r-project.org > Subject: Re: [R] Assumptions for ANOVA: the right way to check the > normality > > Dear Robert, [[elided Yahoo spam]] > So you also think that I have to check only the residuals and not the > data > directly. > Now just for curiosity I did the the shapiro test on the residuals. The > problem > is that on fit3 I don´t get from the test > that the data are normally distribuited. Why? Here the data: > > > shapiro.test(residuals(fit1)) > > Shapiro-Wilk normality test > > data: residuals(fit1) > W = 0.9848, p-value = 0.05693 > > #Here the test is ok: the test says that the data are distributed > normally > (p-value greather than 0.05) > > > > > shapiro.test(residuals(fit2)) > > Shapiro-Wilk normality test > > data: residuals(fit2) > W = 0.9853, p-value = 0.06525 > > #Here the test is ok: the test says that the data are distributed > normally > (p-value greather than 0.05) > > > > > shapiro.test(residuals(fit3)) > > Shapiro-Wilk normality test > > data: residuals(fit3) > W = 0.9621, p-value = 0.0001206 > > > > Now the test reveal
Re: [R] Assumptions for ANOVA: the right way to check the normality
nal Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > project.org] On Behalf Of Frodo Jedi > Sent: Wednesday, January 05, 2011 3:22 PM > To: Robert Baer; r-help@r-project.org > Subject: Re: [R] Assumptions for ANOVA: the right way to check the > normality > > Dear Robert, [[elided Yahoo spam]] > So you also think that I have to check only the residuals and not the > data > directly. > Now just for curiosity I did the the shapiro test on the residuals. The > problem > is that on fit3 I don´t get from the test > that the data are normally distribuited. Why? Here the data: > > > shapiro.test(residuals(fit1)) > >Shapiro-Wilk normality test > > data: residuals(fit1) > W = 0.9848, p-value = 0.05693 > > #Here the test is ok: the test says that the data are distributed > normally > (p-value greather than 0.05) > > > > > shapiro.test(residuals(fit2)) > >Shapiro-Wilk normality test > > data: residuals(fit2) > W = 0.9853, p-value = 0.06525 > > #Here the test is ok: the test says that the data are distributed > normally > (p-value greather than 0.05) > > > > > shapiro.test(residuals(fit3)) > >Shapiro-Wilk normality test > > data: residuals(fit3) > W = 0.9621, p-value = 0.0001206 > > > > Now the test reveals p-value lower than 0.05: so the residuals for fit3 > are not > distributed normally > Why I get this beheaviour? Indeed in the histogram and Q-Q plot for > fit3 > residuals I get a normal distribution. > > > > > > > > > > > > > > > > > > From: Robert Baer > > Sent: Wed, January 5, 2011 8:56:50 PM > Subject: Re: [R] Assumptions for ANOVA: the right way to check the > normality > > > Someone suggested me that I don´t have to check the normality of the > data, but > > the normality of the residuals I get after the fitting of the linear > model. > > I really ask you to help me to understand this point as I don´t find > enough > > material online where to solve it. > > Try the following: > # using your scrd data and your proposed models > fit1<- lm(response ~ stimulus + condition + stimulus:condition, > data=scrd) > fit2<- lm(response ~ stimulus + condition, data=scrd) > fit3<- lm(response ~ condition, data=scrd) > > # Set up for 6 plots on 1 panel > op = par(mfrow=c(2,3)) > > # residuals function extracts residuals > # Visual inspection is a good start for checking normality > # You get a much better feel than from some "magic number" statistic > hist(residuals(fit1)) > hist(residuals(fit2)) > hist(residuals(fit3)) > > # especially qqnorm() plots which are linear for normal data > qqnorm(residuals(fit1)) > qqnorm(residuals(fit2)) > qqnorm(residuals(fit3)) > > # Restore plot parameters > par(op) > > > > > If the data are not normally distributed I have to use the kruskal > wallys test > > and not the ANOVA...so please help > > me to understand. > > Indeed - Kruskal-Wallis is a good test to use for one factor data that > is > ordinal so it is a good alternative to your fit3. > Your "response" seems to be a discrete variable rather than a > continuous > variable. > You must decide if it is reasonable to approximate it with a normal > distribution > which is by definition continuous. > > > > > I make a numerical example, could you please tell me if the data in > this table > > are normally distributed or not? > > > > Help! > > > > > > number stimulus condition response > > 1flat_550_W_realismA3 > > 2flat_550_W_realismA3 > > 3flat_550_W_realismA5 > > 4flat_550_W_realismA3 > > 5flat_550_W_realismA3 > > 6flat_550_W_realismA3 > > 7flat_550_W_realismA3 > > 8flat_550_W_realismA5 > > 9flat_550_W_realismA3 > > 10flat_550_W_realismA3 > > 11flat_550_W_realismA5 > > 12flat_550_W_realismA7 > > 13flat_550_W_realismA5 > > 14flat_550_W_realismA2 > > 15flat_550_W_realismA3 > > 16flat_550_W_realismAH7 > > 17flat_550_W_realismAH4 > > 18flat_550_W_realismAH5 &g
[R] anova vs aov commands for anova with repeated measures
Dear all, I need to understand a thing in the beheaviour of the two functions aov and anova in the following case involving an analysis of ANOVA with repeated measures: If I use the folowing command I don´t get any problem: >aov1 = aov(response ~ stimulus*condition + >Error(subject/(stimulus*condition)), >data=scrd) > summary(aov1) Instead if I try to fit the same model for the regression I get an error: > fit1<- lm(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), >data=scrd) > Error in eval(expr, envir, enclos) : could not find function "Error" so I cannot run the command anova(fit1) afterwards. I want to use fit1<- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) because I want to analyze the residuals in order to check normality, and see if the anova assumption of normality still holds. Could you please help me in understanding how to do this? Thanks in advance Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] anova vs aov commands for anova with repeated measures
Dear Bill, many thanks for your answer. I got the point but still I would need an help with a numeric example, as I haven´t fully understood how it works with R (I had a look to proj() but I am not aware of how can be used correctly). Could you please make an example? For example, in attachment you find a .csv table resulting from an experiment, you can access it by means of this command: > scrd<- >read.csv(file='/Users/../tables_for_R/table_quality_wood.csv',sep=',',header=T) > The data are from an experiment where participants had to evaluate on a seven point likert scale the realism of some stimuli, which are presented both in condition "A" and in condition "AH". I need to perform the ANOVA by means of this command: > aov1 = aov(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), >data=scrd) but the problem is that I cannot plot as usually do the qqnorm on the residuals of the fit because lm does not support the Error term present in aov. I normally check normality through a plot (or a shapiro.test function). Now could you please illustrate me how will you be able to undestand from my data if they are normally distributed? Please enlighten me Best regards From: "bill.venab...@csiro.au" To: frodo.j...@yahoo.com; r-help@r-project.org Sent: Sat, January 8, 2011 4:01:41 AM Subject: RE: [R] anova vs aov commands for anova with repeated measures lm() and aov() are not fully equivalent. They both fit linear models, but they use different algorighms, and this allows aov, for example, to handle some simple multistratum models. The algorithm used by lm does not allow this, but it has other advantages for simpler models. If you want to fit a multistratum model, such as a repeated measures model, you need to use aov. When it comes to finding the residuals, you have to be explicit about which residuals you mean, too. You get residuals for each stratum in a multistratum model. Using plain old resid() will not work as that function can only be used when there is only one kind of residuals vector defined. (it could me made to do something sensible, but that's another issue. Right now, it doesn't.) The function proj() can be used on a fitted model object to obtain the projections, at each stratum, on to the subspaces defined by the terms in the model, and includes the residuals (the projection onto the orthogonal complement of the model space in R^n) as the final column of the matrix of projections. These are easy to dig out and you can analyse away. See ?proj for an example of its use, but you will need to dig out the appropriate column yourself. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Frodo Jedi [frodo.j...@yahoo.com] Sent: 08 January 2011 01:51 To: r-help@r-project.org Subject: [R] anova vs aov commands for anova with repeated measures Dear all, I need to understand a thing in the beheaviour of the two functions aov and anova in the following case involving an analysis of ANOVA with repeated measures: If I use the folowing command I don´t get any problem: >aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), >data=scrd) > summary(aov1) Instead if I try to fit the same model for the regression I get an error: > fit1<- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), >data=scrd) > Error in eval(expr, envir, enclos) : could not find function "Error" so I cannot run the command anova(fit1) afterwards. I want to use fit1<- lm(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) because I want to analyse the residuals in order to check normality, and see if the anova assumption of normality still holds. Could you please help me in understanding how to do this? Thanks in advance Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assumptions for ANOVA: the right way to check the normality
ned to P, resampling and confidence intervals, and looking at sums > of > squares terms add to explanation of the model. I've tried the plot() > function to help graphically evaluate a model, and I want to make sure > I > understand what it's showing me. I think the first, is showing me the > models fitted values vs the residuals, and ideally, I think the closer > the points are to the red line the better. The next plot is a Q-Q plot, > the closer the points to the line, the more normal the model > coefficients (or perhaps the data). I'm not sure what the next two > plots > are, but it is titled Scale-Location. And it looks to have the square > root of standardized residuals on y, and fitted model values on x. > Might > this be similar to the first plot? The final one is titled Residuals vs > Leverage, which has standardized residuals on y and leverage on x, and > something called Cooks Distance is plotted as well. > > Thanks, > Ben. W > > Whether to use anova and other normality based tests is really a > matter of what assumptions you are willing to live with and what level > of "close enough" you are comfortable with. Consulting with a local > consultant with experience in these areas is useful if you don't have > enough experience to decide what you are comfortable with. > > > > For your description, I would try the proportional odds logistic > regression, but again, you should probably consult with someone who has > experience rather than trying that on your own until you have more > training and experience. > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > From: Frodo Jedi [mailto:frodo.j...@yahoo.com] > > Sent: Thursday, January 06, 2011 12:57 PM > > To: Greg Snow; r-help@r-project.org > > Subject: Re: [R] Assumptions for ANOVA: the right way to check the > normality > > > > > > Ok, > > I see ;-) > > > > Let´s put in this way then. When do I have to use the kruskal wallis > test? I mean, when I am very sure that I have > > to use it instead of ANOVA? > > > > Thanks > > > > > > Best regards > > > > P.S. In addition, which is the non parametric methods corresponding > to a 2 ways anova?..or have I to > > repeat many times the kruskal wallis test? > > > > From: Greg Snow > > To: Frodo Jedi; Robert Baer; > "r-help@r-project.org" > > Sent: Thu, January 6, 2011 7:07:17 PM > > Subject: RE: [R] Assumptions for ANOVA: the right way to check the > normality > > > > Remember that an non-significant result (especially one that is still > near alpha like yours) does not give evidence that the null is true. > The reason that the 1st 2 tests below don't show significance is more > due to lack of power than some of the residuals being normal. The only > test that I would trust for this is SnowsPenultimateNormalityTest > (TeachingDemos package, the help page is more useful than the function > itself). > > > > But I think that you are mixing up 2 different concepts (a very > common misunderstanding). What is important if we want to do normal > theory inference is that the coefficients/effects/estimates are > normally distributed. Now since these coefficients can be shown to be > linear combinations of the error terms, if the errors are iid normal > then the coefficients are also normally distributed. So many people > want to show that the residuals come from a perfectly normal > distribution. But it is the theoretical errors, not the observed > residuals that are important (the observed residuals are not iid). You > need to think about the source of your data to see if this is a > reasonable assumption. Now I cannot fathom any universe (theoretical > or real) in which normally distributed errors added to means that they > are independent of will result in a finite set of integers, so an > assumption of exact normality is not reasonable (some may want to argue > this, but convincing me will be very difficult). But looking for exact > normality is a bit of a red herring because, we also have the Central > Limit Theorem that says that if the errors are not normal (but still > iid) then the distribution of the coefficients will approach normality > as the sample size increases. This is what make statistics doable > (because no real dataset entered into the computer is exactly normal). > The more important question is are the residuals "normal enough"? for > which there is not a definitive test (experience and plots help). > > &
Re: [R] anova vs aov commands for anova with repeated measures
Dear Dieter, thanks a lot for your answer. Don´t you mind to do an example of the use of the command? For example, in attachment you find a .csv table resulting from an experiment, you can access it by means of this command: > scrd<- >read.csv(file='/Users/../tables_for_R/table_quality_wood.csv',sep=',',header=T) > The data are from an experiment where participants had to evaluate on a seven point likert scale the realism of some stimuli, which are presented both in condition "A" and in condition "AH". I need to perform the ANOVA by means of this command: > aov1 = aov(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), >data=scrd) All the best From: Dieter Menne To: r-help@r-project.org Sent: Sat, January 8, 2011 3:50:23 PM Subject: Re: [R] anova vs aov commands for anova with repeated measures Bill.Venables wrote: > > ... > If you want to fit a multistratum model, such as a repeated measures > model, you need to use aov. > > ...Or lme in nlme / lmer in lme4. Dieter -- View this message in context: http://r.789695.n4.nabble.com/anova-vs-aov-commands-for-anova-with-repeated-measures-tp3179343p3205018.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Anova with repeated measures for unbalanced design
Dear all, I need an help because I am really not able to find over internet a good example in R to analyze an unbalanced table with Anova with repeated measures. For unbalanced table I mean that the questions are not answered all by the same number of subjects. For a balanced case I would use the command aov1 = aov(response ~ stimulus*condition + Error(subject/(stimulus*condition)), data=scrd) Does the same code still work for unbalanced design? Can anyone provide a R example of code in order to get the same analysis? Thanks in advance for any suggestion. Best regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anova with repeated measures for unbalanced design
Hi Ben, many many thanks!! From: Ben Bolker To: r-h...@stat.math.ethz.ch Sent: Sat, January 8, 2011 9:39:20 PM Subject: Re: [R] Anova with repeated measures for unbalanced design Frodo Jedi yahoo.com> writes: > > Dear all, > I need an help because I am really not able to find over > internet a good example > in R to analyze an unbalanced table with Anova with > repeated measures. > For unbalanced table I mean that the questions are > not answered all by the same > number of subjects. > > For a balanced case I would use the command > > aov1 = aov(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), > data=scrd) I recommend that you find a copy of Pinheiro and Bates 2000 ... > > Does the same code still work for unbalanced design? No, it doesn't. > Can anyone provide a R example of code in order to get the same analysis? Something like library(nlme) lme1 <- lme(response~stimulus*condition,random=~1|subject,data=scrd) or possibly lme1 <- lme(response~stimulus*condition,random=~stimulus*condition|subject, data=scrd) if your experimental design would allow you to estimate stimulus and condition effects for each subject. Further questions along these lines should probably go to the r-sig-mixed-model mailing list. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Post hoc analysis for ANOVA with repeated measures
Dear all, how can I perform a post hoc analysis for ANOVA with repeated measures (in presence of a balanced design)? I am not able to find a good example over internet in R...is there among you someone so kind to give me an hint with a R example please? For example, the aov result of my analysis says that there is a statistical difference between stimuli (there are 7 different stimuli). ...I would like to know which stimuli are involved. > aov1 = aov(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), >data=scrd) > summary(aov1) Error: subject Df Sum Sq Mean Sq F value Pr(>F) Residuals 14 227.57 16.255 Error: subject:stimulus Df Sum Sq Mean Sq F value Pr(>F) stimulus 6 11.695 1.94921 2.3009 0.04179 * Residuals 84 71.162 0.84717 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Error: subject:condition Df Sum Sq Mean Sq F value Pr(>F) condition 1 42.076 42.076 12.403 0.003386 ** Residuals 14 47.495 3.393 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Error: subject:stimulus:condition Df Sum Sq Mean Sq F value Pr(>F) stimulus:condition 6 9.124 1.5206 1.4465 0.2068 Residuals 84 88.305 1.0513 > In attachment you find the table I am using, you can access it by means of: scrd<- read.csv(file='/Users/../tables_for_R/table_quality_gravel.csv',sep=',',header=T) The data are from an experiment where participants had to evaluate on a seven point likert scale the realism of some stimuli, which are presented both in condition "A" and in condition "AH". Thanks in advance Best regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Anova with repeated measures for unbalanced design
Dear list members, I extensively searched in the previous threads of this mailing list an example easy to understand for me, and able to fit my problem, but I didn´t succed to find a solution for which I feel certain. At the moment I am stuck with this solution for my unbalanced design, and I don´t know if it is correct or not: library(nlme) scrd.lme <- lme(response~stimulus*condition,random=~1|subject,data=scrd) Now at this point is it correct to simply run the command anova(scrd.lme) ? Or should I do something different using aov? As I told, for a balanced case I would use the command > aov1 = aov(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), >data=scrd) Now in the R prompt I get this output, which is very different from the one listed of aov for a balanced case: > anova(scrd.lme) numDF denDF F-value p-value (Intercept)1 182 178.56833 <.0001 stimulus 6 182 1.57851 0.1557 condition 1 182 39.68822 <.0001 stimulus:condition 6 182 0.67992 0.6660 Now, (if the previous point is correct) I am burning with curiosity to solve another problem. I found that for condition there are significatively differences. For condition I have only 2 levels so there is no need to do a post-hoc analysis. But if I had 4 levels, I would need. Now, which is for the ANOVA with repeated measures with UNBALANCED design the right approach for a post hoc test? Is there anyone who can provide a R code example to solve this problem so I can better understand? I know, I should read some books to understand better the subject. I am doing my best. Thanks for any suggestion From: Ben Bolker To: r-h...@stat.math.ethz.ch Sent: Sat, January 8, 2011 9:39:20 PM Subject: Re: [R] Anova with repeated measures for unbalanced design Frodo Jedi yahoo.com> writes: > > Dear all, > I need an help because I am really not able to find over > internet a good example > in R to analyze an unbalanced table with Anova with > repeated measures. > For unbalanced table I mean that the questions are > not answered all by the same > number of subjects. > > For a balanced case I would use the command > > aov1 = aov(response ~ stimulus*condition + > Error(subject/(stimulus*condition)), > data=scrd) I recommend that you find a copy of Pinheiro and Bates 2000 ... > > Does the same code still work for unbalanced design? No, it doesn't. > Can anyone provide a R example of code in order to get the same analysis? Something like library(nlme) lme1 <- lme(response~stimulus*condition,random=~1|subject,data=scrd) or possibly lme1 <- lme(response~stimulus*condition,random=~stimulus*condition|subject, data=scrd) if your experimental design would allow you to estimate stimulus and condition effects for each subject. Further questions along these lines should probably go to the r-sig-mixed-model mailing list. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Assumptions for ANOVA: the right way to check the normality
ormal enough"? for > which there is not a definitive test (experience and plots help). > > > > But this all depends on another assumption that I don't think that > you have even considered. Yes we can use normal theory even when the > random part of the data is not normally distributed, but this still > assumes that the data is at least interval data, i.e. that we firmly > believe that the difference between a response of 1 and a response of 2 > is exactly the same as a difference between a 6 and a 7 and that the > difference from 4 to 6 is exactly twice that of 1 vs. 2. From your > data and other descriptions, I don't think that that is a reasonable > assumption. If you are not willing to make that assumption (like me) > then means and normal theory tests are meaningless and you should use > other approaches. One possibility is to use non-parametric methods > (which I believe Frank has already suggested you use), another is to > use proportional odds logistic regression. > > > > > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org<mailto:greg.s...@imail.org> > > 801.408.8111 > > > > > >> -Original Message- > >> From: r-help-boun...@r-project.org<mailto:r-help-boun...@r- > project.org> [mailto:r-help-boun...@r- > >> project.org<http://project.org>] On Behalf Of Frodo Jedi > >> Sent: Wednesday, January 05, 2011 3:22 PM > >> To: Robert Baer; r-help@r-project.org<mailto:r-help@r-project.org> > >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the > >> normality > >> > >> Dear Robert, [[elided Yahoo spam]] > >> So you also think that I have to check only the residuals and not > the > >> data > >> directly. > >> Now just for curiosity I did the the shapiro test on the residuals. > The > >> problem > >> is that on fit3 I don´t get from the test > >> that the data are normally distribuited. Why? Here the data: > >> > >>> shapiro.test(residuals(fit1)) > >>Shapiro-Wilk normality test > >> > >> data: residuals(fit1) > >> W = 0.9848, p-value = 0.05693 > >> > >> #Here the test is ok: the test says that the data are distributed > >> normally > >> (p-value greather than 0.05) > >> > >> > >> > >>> shapiro.test(residuals(fit2)) > >>Shapiro-Wilk normality test > >> > >> data: residuals(fit2) > >> W = 0.9853, p-value = 0.06525 > >> > >> #Here the test is ok: the test says that the data are distributed > >> normally > >> (p-value greather than 0.05) > >> > >> > >> > >>> shapiro.test(residuals(fit3)) > >>Shapiro-Wilk normality test > >> > >> data: residuals(fit3) > >> W = 0.9621, p-value = 0.0001206 > >> > >> > >> > >> Now the test reveals p-value lower than 0.05: so the residuals for > fit3 > >> are not > >> distributed normally > >> Why I get this beheaviour? Indeed in the histogram and Q-Q plot for > >> fit3 > >> residuals I get a normal distribution. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> From: Robert Baermailto:rb...@atsu.edu>> > >> > >> Sent: Wed, January 5, 2011 8:56:50 PM > >> Subject: Re: [R] Assumptions for ANOVA: the right way to check the > >> normality > >> > >>> Someone suggested me that I don´t have to check the normality of > the > >> data, but > >>> the normality of the residuals I get after the fitting of the > linear > >> model. > >>> I really ask you to help me to understand this point as I don´t > find > >> enough > >>> material online where to solve it. > >> Try the following: > >> # using your scrd data and your proposed models > >> fit1<- lm(response ~ stimulus + condition + stimulus:condition, > >> data=scrd) > >> fit2<- lm(response ~ stimulus + condition, data=scrd) > >> fit3<- lm(response ~ condition, data=scrd) > >> > >> # Set up for 6 plots on 1 panel > >> op = par(mfrow=c(2,3)) > >> > >> # residuals function extra
Re: [R] Assumptions for ANOVA: the right way to check the normality
Many many thanks for your feedback Greg. You have been very enlightening for me. Now is time for me to study the material you kindly provided me. Thanks. From: Greg Snow Cc: "r-help@r-project.org" Sent: Tue, January 11, 2011 10:13:34 PM Subject: RE: [R] Assumptions for ANOVA: the right way to check the normality > Sent: Monday, January 10, 2011 5:44 PM > To: Greg Snow > Cc: r-help@r-project.org > Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality > > Dear Greg, > first of all thanks for your reply. And I add also many thanks to all of you >guys who are helping me, sorry for the amount of questions I recently posted >;-) > > > I don´t have a solid statistics background (I am not a statician) and I am >basically learning everything by myself. > > So my first goal is TO UNDERSTAND. I need to have general guidelines because >for my PhD I am doing and I will do several psycophysic experiments. > I am totally alone in this challenge, so I am asking some help to you guys as > I >think that here is the best place to exchange the thing that I miss > and that will never found in any book: the experience. Isn't there a single statistician anywhere in the University? Does your committee have any experience with any of this? > >What is the question you are really trying to find the answer for? Knowing >that may help us give more meaningful answers. > > Concerning your question I thought to have been clear. I want to understand >which analysis I have to use in order to understand if > > the differences I am having are statistically significant or not. Now, as in >all the books I read there is written that to apply ANOVA > > I must respect the assumption of normality then I am try to find a way to >understand this. A general run of anova procedures will produce multiple p-values addressing multiple null hypotheses addressing many different questions (often many of which are uninteresting). Which terms are you really trying to test and which are included because you already know that they have an effect. Are you including interactions because you find them actually interesting? Or just because that is what everyone else does? [snip] > >Also remember that the normality of the data/residuals/etc. is not as >important as the CLT for your sample size. The main things that make the CLT >not work (for samples that are >not large enough) are outliers and strong >skewness, since your outcome is limited to the numbers 1-7, I donât see >outliers >or skewness being a real problem. So you are probably >fine for fixed effects >style models (though checking with experts in your area or doing simulations >can >support/counter this). > > > As far as I have seen everyone in my field does ANOVA. [imagine best Mom voice] and if everyone in your field jumped off a cliff . . . Do you want to do what everyone else is doing, or something new and different? What does your committee chair say about this? > >But when you add in random effects then there is a lot of uncertainty about > >if >the normal theory still holds, the latest lme code uses mcmc sampling rather >than depending on >normal theory and is still being developed. > > > For "random effects" do you mean the repeated measures right? So why > staticians >developed the ANOVA with repeated measure if there is so much uncertainty? Repeated measures are one type of random effect analysis, but random and mixed effects is more general than just repeated measures. Statisticians developed those methods because they worked for simple cases, made some sense for more complicated cases, and they did not have anything that was both better and practical. Now with modern computers we can see when those do work (unfortunately not as often as had been hoped) and what was once impractical is now much simpler (but inertia is to do it the old way, even though the people who developed the old way would have preferred to do it our way). The article: Why Permutation Tests Are Superior to t and F Tests in Biomedical Research John Ludbrook and Hugh Dudley The American Statistician Vol. 52, No. 2 (May, 1998), pp. 127-132 May be enlightening here (and give possible alternatives). Also see: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001819.html for some simulation involving mixed models. One shows that the normal theory works fine for that particular case, the next one shows a case where the normal theory does not work, then shows how to use simulation (parametric bootstrap) to get a more appropriate p-value. You can adapt those examples for your own situation. > > >This now comes back to my first question: what are you trying to find out? > > My ultimate goal is to find the p-values in order to understand if my results >are significative or not. So I can write them on the paper ;-) There is a function in the TeachingDemos package that will produ