[R] - dunn.test gives strange output
Dear all, I have a question concerning my output of the dunn.test function in R. I like to compare different datasets, which are not distributed normally, so I use the Dunn.test to do pairwise comparison. I have 2 questions concerning the output: - why are my groupnames changed, the output gives 4 times swirskii, although my groupnames are longer (see my dataset) - secondly when I check the p-values, I see something very odd: I see that p values where the same groups are compared sometimes indicate that they are different, why are not all those p-values = 1 ? This is my dataset: structure(list(controle = c(111, 88, 216, 169), chemie = c(47, 31, 35, 30), IPM = c(0, 0, 0, 1), gallicus = c(102, 152, 102, 75), swirskii3 = c(1, 0, 0, 0), swirskiiA = c(0, 0, 1, 3), swirskiiP = c(0, 0, 1, 6), swirskii1x = c(12, 2, 75, 46)), .Names = c("controle", "chemie", "IPM", "gallicus", "swirskii3", "swirskiiA", "swirskiiP", "swirskii1x"), row.names = c(NA, 4L), class = "data.frame") I get the following output: DUNN <- dunn.test(value, variable, method = "none",kw=TRUE) Kruskal-Wallis rank sum test data: value and variable Kruskal-Wallis chi-squared = 26.8894, df = 7, p-value = 0 Comparison of value by variable (No adjustment) Col Mean-| Row Mean | controle chemieIPM gallicus swirskii swirskii -+-- chemie | -1.341044 -3.410083 -2.069039 -0.325682 1.015361 3.084401 | 0.0900 0.0003 0.0193 0.3723 0.1550 0.0010 | IPM | -3.410083 -2.069039 -0.325682 1.015361 3.084401 -3.410083 | 0.0003 0.0193 0.3723 0.1550 0.0010 0.0003 | gallicus | -0.325682 1.015361 3.084401 -3.410083 -2.069039 0.00 | 0.3723 0.1550 0.0010 0.0003 0.0193 0.5000 | swirskii | -3.410083 -2.069039 0.00 -3.084401 -3.007770 -1.666726 | 0.0003 0.0193 0.5000 0.0010 0.0013 0.0478 | swirskii | -3.007770 -1.666726 0.402313 -2.682088 0.402313 -2.969454 | 0.0013 0.0478 0.3437 0.0037 0.3437 0.0015 | swirskii | -2.969454 -1.628410 0.440628 -2.643772 0.440628 0.038315 | 0.0015 0.0517 0.3297 0.0041 0.3297 0.4847 | swirskii | -1.475148 -0.134104 1.934935 -1.149466 1.934935 1.532621 | 0.0701 0.4467 0.0265 0.1252 0.0265 0.0627 Col Mean-| Row Mean | swirskii -+--- chemie | -3.410083 | 0.0003 | IPM | -2.069039 | 0.0193 | gallicus | -3.084401 | 0.0010 | swirskii | 0.402313 | 0.3437 | swirskii | -1.628410 | 0.0517 | swirskii | -1.475148 | 0.0701 | swirskii | 1.494306 | 0.0675 Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] - Obtaining superscripts to affix to means that are not significantly different from each other with R
Hello all, It is often time consuming to interpret p-values of multiple pairwise comparisons of groups and assign them a letter code for publication purposes. So I found this interesting link to a program that does this for you. http://www.jerrydallal.com/lhsp/similar.htm I was wondering if something similar exists in R? Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] - Obtaining superscripts to affix to means that are not significantly different from each other with R
Is there also a version for non parametric tests like: pairwise.wilcox.test {stats} Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, België T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be From: David L Carlson To: Joachim Audenaert , "r-help@r-project.org" Date: 23/04/2015 14:51 Subject:RE: [R] - Obtaining superscripts to affix to means that are not significantly different from each other with R The function cld() in package multcomp generates compact letter displays, but does not format them as exponents of the group names. - David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Joachim Audenaert Sent: Thursday, April 23, 2015 4:58 AM To: r-help@r-project.org Subject: [R] - Obtaining superscripts to affix to means that are not significantly different from each other with R Hello all, It is often time consuming to interpret p-values of multiple pairwise comparisons of groups and assign them a letter code for publication purposes. So I found this interesting link to a program that does this for you. http://www.jerrydallal.com/lhsp/similar.htm I was wondering if something similar exists in R? Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] : automated levene test and other tests for variable datasets
Hello all, I am writing a script for statistical comparison of means. I'm doing many field trials with plants, where we have to compare the efficacy of different treatments on, different groups of plants. Therefore I would like to automate this script so it can be used for different datasets of different experiments (which will have different dimensions). An example dataset is given here under, I would like to compare if the data of 5 columns (A,B,C,D,E) are statistically different from each other, where A, B, C, D and A are different treatments of my plants and I have 5 replications for this experiment dataset <- structure(list(A = c(62, 55, 57, 103, 59), B = c(36, 24, 61, 19, 79), C = c(33, 97, 54, 48, 166), D = c(106, 82, 116, 85, 94), E = c(32, 16, 9, 7, 46)), .Names = c("A", "B", "C", "D","E"), row.names = c(NA, 5L), class = "data.frame") 1) First I would like to do a levene test to check the equality of variances of my datasets. Currently I do this as follows: library("car") attach(dataset) y <- c(A,B,C,D,E) group <- as.factor(c(rep(1, length(A)), rep(2, length(B)),rep(3, length(C)), rep(4, length(D)),rep(5, length(E leveneTest(y, group) Is there a way to automate this for all types of datasets, so that I can use the same script for a datasets with any number of columns of data to compare? My above script only works for a dataset with 5 columns to compare 2) For my boxplots I use boxplot(dataset) which gives me all the boxplots of each dataset, so this is how I want it 3) To check normality I currently use the kolmogorov smirnov test as follows ks.test(A,pnorm) ks.test(B,pnorm) ks.test(C,pnorm) ks.test(D,pnorm) ks.test(E,pnorm) Is there a way to replace the A, B, C, ... on the five lines into one line of entry so that the kolmogorov smirnov test is done on all columns of my dataset at once? 4) if data is normally distributed and the variances are equal I want to do a t-test and do pairwise comparison, currently like this pairwise.t.test(y,group,p.adjust.method = "none") if data is not normally distributed or variances are unequal I do a pairwise comparison with the wilcoxon test pairwise.wilcox.test(y,group,p.adjust.method = "none") But again I would like to make this easier, is there a way to replace the y and group in my datalineby something so it works for any size of dataset? 5) Once I have my paiwise comparison results I know which groups are statistically different from others, so I can add a and b and c to different groups in my graph. Currently I do this on a sheet of paper by comparing them one by one. Is there also a way to automate this? So R gives me for example something like this A: a B: a C: b D: ab E: c All help and commentys are welcome. I'm quite new to R and not a statistical genious, so if I'm overseeing things or thinking in a wrong way please let me know how I can improve my way of working. In short I would like to build a script that can compare the means of different groups of data and check if they are statistically diiferent Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] : automated levene test and other tests for variable datasets
Thank you very much for the reply Thierry, It was very useful for me, currently I updated my script as follows, to be able to use the same script for different datasets: adapting my dataset : y <- melt(dataset, na.rm=TRUE) where "na.rm = true" ommits missing data points variable <- y[,1] value <- y[,2] and then for the tests leveneTest(value~variable,y) apply(dataset,MARGIN=2,FUN=function(x) ks.test(x,pnorm)$p.value) pairwise.t.test(value,variable,p.adjust.method = "none") pairwise.wilcox.test(value,variable,p.adjust.method = "none") Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be From: Thierry Onkelinx To: Joachim Audenaert Cc: "r-help@r-project.org" Date: 15/04/2015 13:31 Subject:Re: [R] : automated levene test and other tests for variable datasets Dear Joachim, Storing your data in a long format will make this a lot easier. library(reshape2) long.data <- melt(dataset, measure.var = c("A", "B", "C", "D", "E")) library(car) leveneTest(value ~ variable, data = long.data) library(plyr) ddply(long.data, "variable", function(x){ks.test(x$value}) Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-04-14 10:07 GMT+02:00 Joachim Audenaert < joachim.audena...@pcsierteelt.be>: Hello all, I am writing a script for statistical comparison of means. I'm doing many field trials with plants, where we have to compare the efficacy of different treatments on, different groups of plants. Therefore I would like to automate this script so it can be used for different datasets of different experiments (which will have different dimensions). An example dataset is given here under, I would like to compare if the data of 5 columns (A,B,C,D,E) are statistically different from each other, where A, B, C, D and A are different treatments of my plants and I have 5 replications for this experiment dataset <- structure(list(A = c(62, 55, 57, 103, 59), B = c(36, 24, 61, 19, 79), C = c(33, 97, 54, 48, 166), D = c(106, 82, 116, 85, 94), E = c(32, 16, 9, 7, 46)), .Names = c("A", "B", "C", "D","E"), row.names = c(NA, 5L), class = "data.frame") 1) First I would like to do a levene test to check the equality of variances of my datasets. Currently I do this as follows: library("car") attach(dataset) y <- c(A,B,C,D,E) group <- as.factor(c(rep(1, length(A)), rep(2, length(B)),rep(3, length(C)), rep(4, length(D)),rep(5, length(E leveneTest(y, group) Is there a way to automate this for all types of datasets, so that I can use the same script for a datasets with any number of columns of data to compare? My above script only works for a dataset with 5 columns to compare 2) For my boxplots I use boxplot(dataset) which gives me all the boxplots of each dataset, so this is how I want it 3) To check normality I currently use the kolmogorov smirnov test as follows ks.test(A,pnorm) ks.test(B,pnorm) ks.test(C,pnorm) ks.test(D,pnorm) ks.test(E,pnorm) Is there a way to replace the A, B, C, ... on the five lines into one line of entry so that the kolmogorov smirnov test is done on all columns of my dataset at once? 4) if data is normally distributed and the variances are equal I want to do a t-test and do pairwise comparison, currently like this pairwise.t.test(y,group,p.adjust.method = "none") if data is not normally distributed or variances are unequal I do a pairwise comparison with the wilcoxon test pairwise.wilcox.test(y,group,p.adjust.method = "none") But again I would like to make this easier, is there a way to replace the y and group in my datalineby something so it works for any size of dataset? 5) Once I have my paiwise comparison results I know which groups are statistically different from others, so I can add a and b and c to different groups in my graph. Currently I do this on a sheet of paper by comparing them one by one. Is there also a way to automate this? So
Re: [R] : automated levene test and other tests for variable datasets
Hello Michael, thank you for the reply, it realy helped me to simplify my script. Basically all my questions are a bit the same, but with your hint I could solve most of my problems. Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, België T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be From: Michael Dewey To: Joachim Audenaert , r-help@r-project.org Date: 14/04/2015 18:17 Subject:Re: [R] : automated levene test and other tests for variable datasets You ask quite a lot of questions, I have given some hints about your first example inline On 14/04/2015 09:07, Joachim Audenaert wrote: > Hello all, > > I am writing a script for statistical comparison of means. I'm doing many > field trials with plants, where we have to compare the efficacy of > different treatments on, different groups of plants. Therefore I would > like to automate this script so it can be used for different datasets of > different experiments (which will have different dimensions). An example > dataset is given here under, I would like to compare if the data of 5 > columns (A,B,C,D,E) are statistically different from each other, where A, > B, C, D and A are different treatments of my plants and I have 5 > replications for this experiment > > dataset <- structure(list(A = c(62, 55, 57, 103, 59), B = c(36, 24, 61, > 19, 79), C = c(33, 97, 54, 48, 166), D = c(106, 82, 116, 85, 94), E = > c(32, 16, 9, 7, 46)), .Names = c("A", "B", "C", "D","E"), row.names = > c(NA, 5L), class = "data.frame") > > 1) First I would like to do a levene test to check the equality of > variances of my datasets. Currently I do this as follows: > > library("car") > attach(dataset) Usually best to avoid this and use the data=parameter or with or within > y <- c(A,B,C,D,E) you could use unlist( ) here > group <- as.factor(c(rep(1, length(A)), rep(2, length(B)),rep(3, > length(C)), rep(4, length(D)),rep(5, length(E you can get the lengths which you need with lengtha <- lapply(dataset, length) or lengths <- sapply(dataset, length) depending then rep(letters[1:length(lengths)], lengths) should get you the group variable you want. I have just typed all those in so there may be typos but at least you know where to look. I am not suggesting that I think automating all statistical analyses is necessarily a good idea either. > leveneTest(y, group) > > Is there a way to automate this for all types of datasets, so that I can > use the same script for a datasets with any number of columns of data to > compare? My above script only works for a dataset with 5 columns to > compare > > 2) For my boxplots I use > > boxplot(dataset) > > which gives me all the boxplots of each dataset, so this is how I want it > > 3) To check normality I currently use the kolmogorov smirnov test as > follows > > ks.test(A,pnorm) > ks.test(B,pnorm) > ks.test(C,pnorm) > ks.test(D,pnorm) > ks.test(E,pnorm) > > Is there a way to replace the A, B, C, ... on the five lines into one line > of entry so that the kolmogorov smirnov test is done on all columns of my > dataset at once? > > 4) if data is normally distributed and the variances are equal I want to > do a t-test and do pairwise comparison, currently like this > > pairwise.t.test(y,group,p.adjust.method = "none") > > if data is not normally distributed or variances are unequal I do a > pairwise comparison with the wilcoxon test > > pairwise.wilcox.test(y,group,p.adjust.method = "none") > > But again I would like to make this easier, is there a way to replace the > y and group in my datalineby something so it works for any size of > dataset? > > 5) Once I have my paiwise comparison results I know which groups are > statistically different from others, so I can add a and b and c to > different groups in my graph. Currently I do this on a sheet of paper by > comparing them one by one. Is there also a way to automate this? So R > gives me for example something like this > > A: a > B: a > C: b > D: ab > E: c > > All help and commentys are welcome. I'm quite new to R and not a > statistical genious, so if I'm overseeing things or thinking in a wrong > way please let me know how I can improve my way of working. In short I > would like to build a script that can compare the means of different > groups of data and check if they are statistically diiferent > > Met vriendelijke groeten - With kind regards, >
[R] melt function chooses wrong id variable with large datasets
Hello all, I'm using a large dataset consisting of 2 groups of data, 2 columns in excel with a header (group name) and 15 000 rows of data. I would like like to compare this data, so I transform my dataset with the melt function to get 1 column of data and 1 column of ID variables, then I can apply different statistical tests. With small datasets this works great, the melt function automatically chooses the name in row 1 as ID variable and melts the data, thus giving me a matrix with all ID variables in column one and the data accordingly in column 2. With this big dataset however it chooses the whole first column as ID variables in stead of the first row. Is there a reason why this happens and how can I make sure the first row is chosen as ID variabele and the lower rows as data? If I specify that I want the first row to be the id variable I also get error. melt(dataset,id.vars=dataset[1,], na.rm=TRUE) Error: id variables not found in data: norm, jaar Are there alternative ways to create a good reshaped dataset? Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be Heb je je individuele begeleiding bemesting (CVBB) al aangevraagd? | Het PCS op LinkedIn Disclaimer | Please consider the environment before printing. Think green, keep it on the screen! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] melt function chooses wrong id variable with large datasets
Hello, This is a part of my dataset: structure(list(januari = c(38.1, 32.4, 34.5, 20.7, 21.5, 23.1, 29.7, 36.6, 36.1, 20.6, 20.4, 30.1, 38.7, 41.4, 37, 36, 37, 38, 23, 26.7), februari = c(31.5, 36.2, 38.2, 26.4, 20.9, 21.5, 30.2, 33.4, 32.6, 22.2, 21.7, 30, 35.7, 32.8, 39.3, 25.5, 23, 19.9, 21.3, 20.8), maart = c(34.2, 27, 24.2, 19.9, 19.7, 21.5, 30.6, 30, 19, 19.6, 20.6, 23.6, 17.9, 17.3, 21.4, 24.1, 20.9, 30.1, 32.6, 21.3), april = c(26.3, 29.6, 30.3, 23.6, 28.4, 20.7, 24.1, 27.3, 23.2, 18.3, 24.6, 27.4, 20.4, 18.1, 25.2, 19.8, 21, 23.7, 19.6, 18.1), mei = c(23.7, 24, 17.2, 23.2, 25.2, 17.2, 16, 15.6, 13.4, 16, 16.8, 14.6, 19.4, 21, 19.5, 18.5, 13.3, 13.7, 14.3, 14.1), juni = c(17.7, 14.2, 16.6, 15.7, 13.7, 14.7, 13.1, 12.9, 15.4, 11.9, 15.2, 15.3, 16.5, 16.1, 11.7, 11.2, 11.5, 10.8, 16.1, 14.8), juli = c(15.7, 14.5, 10.8, 10.5, 13.4, 12.2, 13.2, 13, 12.4, 13.1, 9.8, 10.5, 13.4, 11, 13.1, 15, 16.7, 16.1, 18.2, 15.7), augustus = c(12.9, 12.8, 15.2, 14.5, 17.2, 14.5, 14.4, 11, 13.1, 13.6, 14.6, 12.7, 13.6, 12.7, 15.5, 17.4, 15.2, 14.2, 17.7, 19.2), september = c(15.6, 15.5, 15.9, 15.1, 16, 19.4, 21.5, 23.7, 18.7, 23.8, 18, 16.2, 18.5, 20.6, 18.3, 22.5, 26.9, 19.4, 15.9, 20.5), oktober = c(21.4, 20.8, 14, 17, 23, 26.4, 19.6, 22.7, 26.9, 14.7, 15.2, 19.8, 26.9, 20.2, 14.3, 14.8, 18.5, 21.7, 21.4, 21.8), november = c(24.7, 26.2, 29, 21.6, 17.1, 16.9, 19.1, 24.7, 25.4, 19.8, 18.2, 16.3, 17, 17.7, 15.5, 14.7, 15.8, 19.9, 20.4, 23.3), december = c(19.8, 27, 21, 33, 22.6, 28.3, 21.1, 19, 17.3, 27, 30.2, 24.8, 17.9, 17.9, 20.7, 30.9, 36.2, 21, 20.2, 21.3), norm = c("45.8713463281901", "24.047250681782984", "3.7533684144746324", "38.594241119279324", "26.391897460120358", "61.746470001194638", "6.8321020448487992", "11.933109250115226", "51.951891096493924", "37.424611852237945", "5.1587836676942374", "36.552835044409434", "31.781209673851027", "29.09146215582853", "4.856812959269508", "5.3982910143166514", "46.553976273304215", "17.566272518985429", "20.552451905814117", "61.894775704479279" )), .Names = c("januari", "februari", "maart", "april", "mei", "juni", "juli", "augustus", "september", "oktober", "november", "december", "norm"), row.names = c(NA, 20L), class = "data.frame") I transform my dataset with the following script: y <- melt(dataset,na.rm=TRUE) variable <- y[,1] value <- y[,2] and can then perform a levene test as follows: LEVENE <- leveneTest(value~variable,y) When the dataset is small, lets say less than 100 values per column everything works great. I get the message: No id variables; using all as measure variables When the dataset is much bigger I get the following message Using norm as id variables, why does this function pick norm as id variable? and how can I tell R that each column title is my variable Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be From: PIKAL Petr To: Joachim Audenaert , "r-help@r-project.org" Date: 16/04/2015 12:13 Subject:RE: [R] melt function chooses wrong id variable with large datasets Hi There is something weird with your data and melt function. AFAIK melt does not use first row as id.variables. What is result of str(dataset) Instead of melt(dataset,id.vars=dataset[1,], na.rm=TRUE) melt expects something like melt(dataset, id.vars=c("norm, "jaar"), na.rm=TRUE) If you want more specific answer you shall show us part of your data, preferably copy output of dput(dataset[1:20,]) into your mail. Cheers Petr > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Joachim > Audenaert > Sent: Thursday, April 16, 2015 11:37 AM > To: r-help@r-project.org > Subject: [R] melt function chooses wrong id variable with large > datasets > > Hello all, > > I'm using a large dataset consisting of 2 groups of data, 2 columns in > excel with a header (group name) and 15 000 rows of data. I would like > like to compare this data, so I transform my dataset with the melt > function to get 1 column of data and 1 column of ID variables, then I > can apply different statistical tests. With small datasets this works > great, the melt function automatically chooses the name in row 1 as ID > variable and melts the data, thus giving me a matrix with all ID &
Re: [R] melt function chooses wrong id variable with large datasets
Thanks, indeed norm should be in the same group as as the months. everything works fine when the number of data is quite small, but with big datasets (15 000 values) things seem to go wrong and I can't explain why. It puts norm as an individual column in stead of in the group of months as it does when the dataset is small. Met vriendelijke groeten - With kind regards, Joachim Audenaert onderzoeker gewasbescherming - crop protection researcher PCS | proefcentrum voor sierteelt - ornamental plant research Schaessestraat 18, 9070 Destelbergen, Belgi� T: +32 (0)9 353 94 71 | F: +32 (0)9 353 94 95 E: joachim.audena...@pcsierteelt.be | W: www.pcsierteelt.be From: PIKAL Petr To: Joachim Audenaert Cc: "r-help@r-project.org" Date: 16/04/2015 13:41 Subject:RE: [R] melt function chooses wrong id variable with large datasets Hi With this dataset I get > dd.m0<-melt(dataset, na.rm=T) Using norm as id variables > head(dd.m0) norm variable value 1 45.8713463281901 januari 38.1 2 24.047250681782984 januari 32.4 3 3.7533684144746324 januari 34.5 4 38.594241119279324 januari 20.7 5 26.391897460120358 januari 21.5 6 61.746470001194638 januari 23.1 > or dd.m<-melt(dataset, id.vars=NULL, na.rm=T) > head(dd.m) variable value 1 januari 38.1 2 januari 32.4 3 januari 34.5 4 januari 20.7 5 januari 21.5 6 januari 23.1 > tail(dd.m) variable value 255 norm 4.856812959269508 256 norm 5.3982910143166514 257 norm 46.553976273304215 258 norm 17.566272518985429 259 norm 20.552451905814117 260 norm 61.894775704479279 The latter will put norm to the same column as months. Is it intended? Maybe you want > dd.m1<-melt(dataset[,-13], na.rm=T) No id variables; using all as measure variables > head(dd.m1) variable value 1 januari 38.1 2 januari 32.4 3 januari 34.5 4 januari 20.7 5 januari 21.5 6 januari 23.1 > tail(dd.m1) variable value 235 december 20.7 236 december 30.9 237 december 36.2 238 december 21.0 239 december 20.2 240 december 21.3 Cheers Petr From: Joachim Audenaert [mailto:joachim.audena...@pcsierteelt.be] Sent: Thursday, April 16, 2015 1:13 PM To: PIKAL Petr Cc: r-help@r-project.org Subject: RE: [R] melt function chooses wrong id variable with large datasets Hello, This is a part of my dataset: structure(list(januari = c(38.1, 32.4, 34.5, 20.7, 21.5, 23.1, 29.7, 36.6, 36.1, 20.6, 20.4, 30.1, 38.7, 41.4, 37, 36, 37, 38, 23, 26.7), februari = c(31.5, 36.2, 38.2, 26.4, 20.9, 21.5, 30.2, 33.4, 32.6, 22.2, 21.7, 30, 35.7, 32.8, 39.3, 25.5, 23, 19.9, 21.3, 20.8), maart = c(34.2, 27, 24.2, 19.9, 19.7, 21.5, 30.6, 30, 19, 19.6, 20.6, 23.6, 17.9, 17.3, 21.4, 24.1, 20.9, 30.1, 32.6, 21.3), april = c(26.3, 29.6, 30.3, 23.6, 28.4, 20.7, 24.1, 27.3, 23.2, 18.3, 24.6, 27.4, 20.4, 18.1, 25.2, 19.8, 21, 23.7, 19.6, 18.1), mei = c(23.7, 24, 17.2, 23.2, 25.2, 17.2, 16, 15.6, 13.4, 16, 16.8, 14.6, 19.4, 21, 19.5, 18.5, 13.3, 13.7, 14.3, 14.1), juni = c(17.7, 14.2, 16.6, 15.7, 13.7, 14.7, 13.1, 12.9, 15.4, 11.9, 15.2, 15.3, 16.5, 16.1, 11.7, 11.2, 11.5, 10.8, 16.1, 14.8), juli = c(15.7, 14.5, 10.8, 10.5, 13.4, 12.2, 13.2, 13, 12.4, 13.1, 9.8, 10.5, 13.4, 11, 13.1, 15, 16.7, 16.1, 18.2, 15.7), augustus = c(12.9, 12.8, 15.2, 14.5, 17.2, 14.5, 14.4, 11, 13.1, 13.6, 14.6, 12.7, 13.6, 12.7, 15.5, 17.4, 15.2, 14.2, 17.7, 19.2), september = c(15.6, 15.5, 15.9, 15.1, 16, 19.4, 21.5, 23.7, 18.7, 23.8, 18, 16.2, 18.5, 20.6, 18.3, 22.5, 26.9, 19.4, 15.9, 20.5), oktober = c(21.4, 20.8, 14, 17, 23, 26.4, 19.6, 22.7, 26.9, 14.7, 15.2, 19.8, 26.9, 20.2, 14.3, 14.8, 18.5, 21.7, 21.4, 21.8), november = c(24.7, 26.2, 29, 21.6, 17.1, 16.9, 19.1, 24.7, 25.4, 19.8, 18.2, 16.3, 17, 17.7, 15.5, 14.7, 15.8, 19.9, 20.4, 23.3), december = c(19.8, 27, 21, 33, 22.6, 28.3, 21.1, 19, 17.3, 27, 30.2, 24.8, 17.9, 17.9, 20.7, 30.9, 36.2, 21, 20.2, 21.3), norm = c("45.8713463281901", "24.047250681782984", "3.7533684144746324", "38.594241119279324", "26.391897460120358", "61.746470001194638", "6.8321020448487992", "11.933109250115226", "51.951891096493924", "37.424611852237945", "5.1587836676942374", "36.552835044409434", "31.781209673851027", "29.09146215582853", "4.856812959269508", "5.3982910143166514", "46.553976273304215", "17.566272518985429", "20.552451905814117", "61.894775704479279" )), .Names = c("januari", "februari", "maart", "april", "mei", "juni", "juli", "augustus", "september", "oktober", "november", "december", "norm"), row.names = c(NA, 20L), class = "data.frame") I transfo
[R] - help with the predict function
Hi all, I would like to predict some values for an nls regression function (functional response model Rogers type II). This is an asymptotic function of which I would like to predict the asymptotic value I estimated the paramters with nls, but can't seem to get predictions for values of m choice.. This is my script: RogersII_N <- nls(FR~N0-lambertW(attackR3_N*Th3_N*N0*exp(-attackR3_N*(24-Th3_N*N0)))/(attackR3_N*Th3_N),start=list(attackR3_N=0.04,Th3_N=1.46),control=list(maxiter=1)) lista <- c(1,2,100,1000) predict(RogersII_N,newdata=lista) I created a list (lista) with some values of which I would like the predict function to give me function values What am I doing wrong? Kind regards, Met vriendelijke groeten, Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Belgium Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] - detecting outliers
Hello all, I am estimating parameters for regression functions on experimental data. Functional response of Rogers type II. I would like to know which points of my dataset are outliers. What is the best method to do this with R? I found a method via R help, but would like to know if there are better methods for my purpose. Here is the script I us now: library("mvoutlier") dat <- read.delim("C:/data.txt") uni.plot(dat) My data looks like the following (copied into a txt file): (N0 is the initial number of eggs fed to the predator, FR is the number of eggs eaten by the predator during 24h) N0 FR 37 30 27 15 36 14 37 13 45 8 25 0 47 20 34 6 25 8 21 7 24 24 34 17 23 10 29 5 38 38 24 24 20 17 14 8 18 15 15 10 26 5 33 5 22 21 38 3 22 20 23 19 20 6 20 4 21 18 25 5 13 13 9 8 8 4 7 7 8 5 11 9 Kind regards, Met vriendelijke groeten, Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Belgium Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gnm and gnlr3
Hi, I am quite new to R and would like to do nonlinear regressions with Poisson distributed data. I would like to estimate paramters of an equation of this type: FR = [c*NO * exp(a+b*NO)] / [(c+NO)*(1+exp(a+b*NO))] a,b and c are parameters, NO are input values I found both the gnm and gnlr3 function which should be able to do this regression but I can't manage to make it work. How can I write my equation to fit into these functions? If I understand it correct I have to split my equation in a "mu", a "shape" and a "family" part for the gnlr function, but don't have a clue how to do this. Is there maybe a different function that is better for my purpose? A version of nls where I can choose a different distribution? Kind Regards Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] regression for poisson distributed data
Hello all, I would like to get parameter estimates for different models. For one of them I give the code in example. I am estimating the parameters (i,j and k) with the nls function, which sees the error distribution as normal, I would however like to do the same as nls with the assumption that the errors are poisson distributed. Is there a way to do this with R? Are there packages designed for this? I tried with the gnm package, but don't understand how to transform my equation to a generalised equation. Is there an option for nls to choose family = poisson? Lower in the mail the code with the model and visualisations I use to check my results. I also copied the test dataset from my txt file. I am using R 2.15 and Rstudio to visualise it. plot(FR~N0) x <- nls(FR~(exp(i+j*N0)/(1+exp(i+j*N0)))*(k*N0/(k+N0)),start=list(i=0.02,j=0.002,k=6)) summary(x) hatx <- predict(x) lines(spline(N0,hatx)) N0 FR 10 2 10 3 10 2 10 4 10 2 10 2 10 1 10 2 10 2 10 2 20 2 20 3 20 3 20 3 20 4 20 2 20 4 20 2 20 3 20 2 30 1 30 2 30 3 30 4 30 5 30 6 30 2 30 3 30 2 30 2 40 2 40 3 40 3 40 6 40 5 40 4 40 3 40 3 40 2 40 3 50 4 50 5 50 2 50 3 50 7 50 5 50 4 50 3 50 4 50 5 60 5 60 6 60 8 60 4 60 4 60 3 60 2 60 2 60 5 60 4 Kind Regards Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] automatically scan multiple starting values
Hi all, I am doing nls regression of the Rogers type III equation. However I can't find good starting values and keep getting the error messages: "Missing value or an infinity produced when evaluating the model" OR "singular gradient matrix at initial parameter estimates" Is there a way to automatically check different starting values and give a list of values for each parameter? In my model only Th3 is a parameter with a good estimate (2.5). for v,w and x I would like to check a broad range of values. Is it possible to do this in R? Under my script for plots and nls regression and a copy of the values of my data file. plot(FR~N0,main="Rogers III") RogersIII <- nls(FR~N0*(1-exp(((v+w*FR)/(1+x*FR))*(Th3*FR-24))),start=list(v=0.002,w=0.00075,x=-0.1,Th3=2.5),control=list(maxiter=100,minFactor=0.1)) # 24 is tijd in uren hatac <- predict(RogersIII) lines(spline(N0,hatac)) summary(RogersIII) N0 FR 5 4 3 2 5 3 5 4 6 4 5 4 4 2 5 4 6 5 5 3 10 5 14 7 12 6 17 9 10 4 16 8 15 8 14 5 15 6 15 5 19 10 20 11 22 10 21 12 25 12 26 11 25 10 25 12 25 11 24 9 35 12 33 12 35 13 36 12 31 12 37 12 39 13 36 14 35 12 35 10 46 11 44 10 41 12 43 14 45 12 48 11 45 13 45 9 49 12 45 11 56 12 59 15 52 11 54 10 51 12 55 12 61 13 55 12 54 10 55 12 Kind regards, Met vriendelijke groeten, Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Belgium Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] error estimating parameters with mle2
Hi all, When I try to estimate the functional response of the Rogers type I equation (for the mle2 you need the package bbmle): > RogersIbinom <- function(N0,attackR2_B,u_B) {attackR2_B+u_B*N0} > RogersI_B <- mle2(FR~dbinom(size=N0,prob=RogersIbinom(N0,attackR2_B,u_B)/N0),start=list(attackR2_B=4.5,u_B=0.16),method="Nelder-Mead",data=data5) I get following error message Error in optim(par = c(4.5, 0.16), fn = function (p) : function cannot be evaluated at initial parameters Can someone tell me what I'm doing wrong? I used estimate starting values which were predicted with the nls function RogersI_N <- nls(FR~attackR2_N+u_N*N0,start=list(attackR2_N=1,u_N=4),control=list(maxiter=1)) For some other models, I do the exact same thing and it works well, so I don't understand why this one doesn't work Kind regards, Met vriendelijke groeten, Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Belgium Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] non-numeric argument in mle2
Hi all, I have some problems with the mle2 function > RogersIIbinom <- function(N0,attackR3_B,Th3_B) {N0-lambertW(attackR3_B*Th3_B*N0*exp(-attackR3_B*(24-Th3_B*N0)))/(attackR3_B*Th3_B)} > RogersII_B <- mle2(FR~dbinom(size=N0,prob=RogersIIbinom(N0,attackR3_B,Th3_B)/N0),start=list(attackR3_B=1.5,Th3_B=0.04),method="Nelder-Mead",data=dat) Error in dbinom(x, size, prob, log) : Non-numeric argument to mathematical function Can somenone explain met what this error means? All my parameters and data are defined, so I don't understand what the non-numeric argument is??? With different equations the script does work... For this function to run you need packages: bbmle and emdbook. Testdata I pasted here N0 FR 5 4 3 2 5 3 5 4 6 4 5 4 4 2 5 4 6 5 5 3 10 5 14 7 12 6 17 9 10 4 16 8 15 8 14 5 15 6 15 5 19 10 20 11 22 10 21 12 25 12 26 11 25 10 25 12 25 11 24 9 35 12 33 12 35 13 36 12 31 12 37 12 39 13 36 14 35 12 35 10 46 11 44 10 41 12 43 14 45 12 48 11 45 13 45 9 49 12 45 11 56 12 59 15 52 11 54 10 51 12 55 12 61 13 55 12 54 10 55 12 Kind regards, Met vriendelijke groeten, Joachim Don't waste paper! Think about the environment before printing this e-mail ______ Joachim Audenaert Adviesdienst Gewasbescherming Proefcentrum voor Sierteelt Schaessestraat 18 B-9070 Destelbergen Belgium Tel. +32 9 353 94 71 Fax +32 9 353 94 95 E-mail: joachim.audena...@pcsierteelt.be www.pcsierteelt.be __ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.