Re: [R] plotting multiple figures on one page
On 03/17/2011 07:46 AM, scarlet wrote: I am new to the R language. I am trying to plot multiple figures on one page through a loop, but the code just produce one graph on one page. Can someone show some light on what's wrong? Here is my code: library("quantreg") tcdata<-read.table("mydata.txt",header=TRUE) postscript("myfigure.ps") basins<- paste(c("b1","b2","b3","b4","b5","b6"),sep="") par(mfrow = c(2,3),mar=c(0,4,0,2),oma=c(5,0,3,0)) for (k in 1:6) { data0=subset(tcdata,basin==basins[k]) attach(data0) model=rq(rain~year,tau=seq(0.1,0.9,0.1)) plot(summary(model,alpha=0.05,se="iid"),parm=2,pch=19,cex=1.2,mar=c(5,5,5,5), ylab=" ",xlab=" ") } dev.off() Hi scarlet, First, you don't really have to paste the values for "basin" together, the "c" is all you need. Trying this code: tcdata<-data.frame(rain=sample(400:1500,360), year=rep(1927:1986,each=6),basin=rep(1:6,60)) par(mfrow = c(2,3),mar=c(0,4,0,2),oma=c(5,0,3,0)) for (k in unique(tcdata$basin)) { data0=subset(tcdata,basin==k) model=rq(rain~year,data=data0,tau=seq(0.1,0.9,0.1)) plot(summary(model,alpha=0.05,se="iid"),parm=2,pch=19, cex=1.2,mar=c(5,5,5,5),ylab=" ",xlab=" ") } does exactly what you say, that is, ignores the division of the figure area into six parts. As ordinary plots work okay, I assume that there is something in the plot method for an rq model that overrides the call to par(mfrow...) Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex query (Apache logs)
Some comments: 1. [^\s] matches everything up to a literal 's', unless perl=TRUE. 2. The (.*) is greedy, so you'll need (.*?)"\s"(.*?)"\s"(.*?)"$ or similar at the end of the expression With those changes (and removing a space inserted by the newsgroup posting) the expression works for me. > (pat <- readLines("/tmp/b.txt")[1]) [1] "^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\"([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\"\\s([^\\s]+)\\s(\\d+)\\s\"(.*?)\"\\s\"(.*?)\"\\s\"(.*?)\"$" > regexpr(pat, test, perl=TRUE) [1] 1 attr(,"match.length") [1] 436 3. Consider a different approach, e.g. scan(textConnection(test), what=character(0)) Hope this helps Allan On 16/03/11 22:18, Saptarshi Guha wrote: Hello R users, I have this regex see [1] for apache log lines. I tried using R to parse some data (only because I wanted to stay in R). A sample line is [2] (a) I saved the line in [1] into "~/tmp/a.txt" and [2] into "/tmp/a.txt" pat<- readLines("~/tmp/a.txt") test<- readLines("/tmp/a.txt") test grep(pat,test) returns integer(0) The same query works in python via re.match() (i.e does return groups) Using readLines, the regex is escaped for me. Does Python and R use different regex styles? Cheers Saptarshi [1] ^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s"([A-Z]*)\s([^\s]*)\s([^\s]*)"\s([^\s]+)\s(\d+)\s"(.*)"\s"(.*)"\s"(.*)"$ [2] 220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] "GET /blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux% 202.6.32.9/default/default/6/6/1/ HTTP/1.1" 200 3243 "-" "Mozilla/5.0 (Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre Fennec/4.0b5" "BLOCKLIST_v3=110.163.217.169.1299218425.9706" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Flexible rbind
Dear All, I am trying to create a empty structure that I want to fill gradually through the code. I want to use something like rbind to create the basic structure first. I am looking for a possibility to do an rbind where the columns names dont match fully (but the missing columns can be defaulted to either zero or n/a) (my actual data has a lot of columns). Please see the data frames below I have the following data frame that is filled > dput(d) structure(list(type = structure(1:9, .Label = c("test1", "test2", "test3", "test4", "test5", "test6", "test7", "test8", "test9" ), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421, 94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819, 20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152, 83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213, 97.83710088)), .Names = c("type", "meter", "degree"), class = "data.frame", row.names = c(NA, -9L)) To the above data I want to append the following data to create an empty structure for the new data > dput(dNewTests) structure(list(type = structure(1:9, .Label = c("test14", "test15", "test16", "test17", "test18", "test19", "test20", "test21", "test22" ), class = "factor")), .Names = "type", class = "data.frame", row.names = c(NA, -9L)) rbind obviously throws the following error: rbind(d, dNewTests) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Any way that I can default the missing columns to NA or some default value like zero? Also, I tried to go through the archives and tried merge_all. What am I doing wrong wih the code below ? require(reshape) merge_all(d, dNewTests,by="type") Error in fix.by(by.x, x) : 'by' must specify column(s) as numbers, names or logical Thanks, Santosh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] table() reading problem
On 03/16/2011 08:20 PM, fre wrote: I have the following problem: I have some string with numbers like k. I want to have a table like the function table() gives. However I am not able to call the first row, the 1, 2, 3, 5, 6 or 9. I tried to do that by a data.frame, but that doesn't seem to work either. The levels keep bothering me. This is an example of the code: k<-c(1,1,1,1,1,3,5,6,2,1,3,9,2,3,1,1,1) table(k) k 1 2 3 5 6 9 9 2 3 1 1 1 x<-table(k) dim(x) [1] 6 x[1] #But i only want the one 1 9 x<-data.frame(x) x[1,1] #You are not allowed to use this one for example 3*x[1,1] is impossible [1] 1 Levels: 1 2 3 5 6 9 I hope anyone has an idea of using the table function without this inconvenience. I thought about writing a counter myself, but that seems complicated. Because I have to examine very large examples later on, I don't want to slow the calculations down if possible. Thanks for your help in advance. Hi Frederique, It seems to me that you want the names of the table rather than the counts, as Sarah has already suggested. So: as.numeric(names(x)) should give you what you want. However, if all you want to do is determine the unique elements of k, then: unique(k) might be what you are seeking. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] table() reading problem
Sender: r-help-boun...@r-project.org On-Behalf-Of: j...@bitwrit.com.au Subject: Re: [R] table() reading problem Message-Id: <4d81c14a.2000...@bitwrit.com.au> Recipient: killian.door...@barclayscapital.com ___ e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___ --- Begin Message --- On 03/16/2011 08:20 PM, fre wrote: I have the following problem: I have some string with numbers like k. I want to have a table like the function table() gives. However I am not able to call the first row, the 1, 2, 3, 5, 6 or 9. I tried to do that by a data.frame, but that doesn't seem to work either. The levels keep bothering me. This is an example of the code: k<-c(1,1,1,1,1,3,5,6,2,1,3,9,2,3,1,1,1) table(k) k 1 2 3 5 6 9 9 2 3 1 1 1 x<-table(k) dim(x) [1] 6 x[1] #But i only want the one 1 9 x<-data.frame(x) x[1,1] #You are not allowed to use this one for example 3*x[1,1] is impossible [1] 1 Levels: 1 2 3 5 6 9 I hope anyone has an idea of using the table function without this inconvenience. I thought about writing a counter myself, but that seems complicated. Because I have to examine very large examples later on, I don't want to slow the calculations down if possible. Thanks for your help in advance. Hi Frederique, It seems to me that you want the names of the table rather than the counts, as Sarah has already suggested. So: as.numeric(names(x)) should give you what you want. However, if all you want to do is determine the unique elements of k, then: unique(k) might be what you are seeking. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- End Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Flexible rbind
Sender: r-help-boun...@r-project.org On-Behalf-Of: santosh.srini...@gmail.com Subject: [R] Flexible rbind Message-Id: Recipient: simon.r...@barclayscapital.com ___ e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___ --- Begin Message --- Dear All, I am trying to create a empty structure that I want to fill gradually through the code. I want to use something like rbind to create the basic structure first. I am looking for a possibility to do an rbind where the columns names dont match fully (but the missing columns can be defaulted to either zero or n/a) (my actual data has a lot of columns). Please see the data frames below I have the following data frame that is filled > dput(d) structure(list(type = structure(1:9, .Label = c("test1", "test2", "test3", "test4", "test5", "test6", "test7", "test8", "test9" ), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421, 94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819, 20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152, 83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213, 97.83710088)), .Names = c("type", "meter", "degree"), class = "data.frame", row.names = c(NA, -9L)) To the above data I want to append the following data to create an empty structure for the new data > dput(dNewTests) structure(list(type = structure(1:9, .Label = c("test14", "test15", "test16", "test17", "test18", "test19", "test20", "test21", "test22" ), class = "factor")), .Names = "type", class = "data.frame", row.names = c(NA, -9L)) rbind obviously throws the following error: rbind(d, dNewTests) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Any way that I can default the missing columns to NA or some default value like zero? Also, I tried to go through the archives and tried merge_all. What am I doing wrong wih the code below ? require(reshape) merge_all(d, dNewTests,by="type") Error in fix.by(by.x, x) : 'by' must specify column(s) as numbers, names or logical Thanks, Santosh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- End Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Flexible rbind
Sender: r-help-boun...@r-project.org On-Behalf-Of: santosh.srini...@gmail.com Subject: [R] Flexible rbind Message-Id: Recipient: killian.door...@barclayscapital.com ___ e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___ --- Begin Message --- Dear All, I am trying to create a empty structure that I want to fill gradually through the code. I want to use something like rbind to create the basic structure first. I am looking for a possibility to do an rbind where the columns names dont match fully (but the missing columns can be defaulted to either zero or n/a) (my actual data has a lot of columns). Please see the data frames below I have the following data frame that is filled > dput(d) structure(list(type = structure(1:9, .Label = c("test1", "test2", "test3", "test4", "test5", "test6", "test7", "test8", "test9" ), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421, 94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819, 20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152, 83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213, 97.83710088)), .Names = c("type", "meter", "degree"), class = "data.frame", row.names = c(NA, -9L)) To the above data I want to append the following data to create an empty structure for the new data > dput(dNewTests) structure(list(type = structure(1:9, .Label = c("test14", "test15", "test16", "test17", "test18", "test19", "test20", "test21", "test22" ), class = "factor")), .Names = "type", class = "data.frame", row.names = c(NA, -9L)) rbind obviously throws the following error: rbind(d, dNewTests) Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match Any way that I can default the missing columns to NA or some default value like zero? Also, I tried to go through the archives and tried merge_all. What am I doing wrong wih the code below ? require(reshape) merge_all(d, dNewTests,by="type") Error in fix.by(by.x, x) : 'by' must specify column(s) as numbers, names or logical Thanks, Santosh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. --- End Message --- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subset using grepl
Ok thank you! On Mar 17, 12:12 pm, wrote: > subset(data,grepl("[1-5]", section) & !grepl("0", section)) > > BTW > > grepl("[1:5]", section) > > does work. It checks for the characters 1, :, or 5. > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Kang Min > Sent: Thursday, 17 March 2011 1:29 PM > To: r-h...@r-project.org > Subject: Re: [R]Subsetusinggrepl > > I have a new question, also regardinggrepl. > I would like tosubsetrows with numbers from 1 to 5 in the section > column, so I used > > subset(data,grepl("[1:5]", section)) > > but this gave me rows with 10, 1 and 5. (Why is this so?) So I tried > > subset(data,grepl("[1,2,3,4,5]", section)) > > which worked. But I also got 10 in the dataframe as well. How can I > exclude 10? > > >data > section piece LTc1 LTc2 > 10a 10-1 0.729095368 NA > 10a 10-2 59.53292189 13.95612454 > 10h 10-3 0.213756661 NA > 10i 10-4 NA NA > 10b NA NA NA > 10c NA NA NA > 10d NA NA NA > 10e NA NA NA > 10f NA NA NA > 10g NA NA NA > 10h NA NA NA > 10j NA NA NA > 1b 1-1 NA NA > 1d 1-2 29.37971303 12.79688209 > 1g 1-6 NA 7.607911603 > 1h 1-3 0.298059164 27.09896941 > 1i 1-4 25.11261782 19.87149991 > 1j 1-5 36.66969601 42.28507923 > 1a NA NA NA > 1c NA NA NA > 1e NA NA NA > 1f NA NA NA > 2a 2-1 15.98582117 10.58696146 > 2a 2-2 0.557308341 41.52650718 > 2c 2-3 14.99499024 10.0896793 > 2e 2-4 148.4530636 56.45493191 > 2f 2-5 25.27493551 12.98808577 > 2i 2-6 20.32857108 22.76075728 > 2b NA NA NA > 2d NA NA NA > 2g NA NA NA > 2h NA NA NA > 2j NA NA NA > 3a 3-1 13.36602867 11.47541439 > 3a 3-7 NA 111.9007822 > 3c 3-2 10.57406701 5.58567 > 3d 3-3 11.73240891 10.73833651 > 3e 3-8 NA 14.54214165 > 3h 3-4 21.56072089 21.59748884 > 3i 3-5 15.42846935 16.62715409 > 3i 3-6 129.7367193 121.8206045 > 3b NA NA NA > 3f NA NA NA > 3g NA NA NA > 3j NA NA NA > 5b 5-1 18.61733498 18.13545293 > 5d 5-3 NA 7.81018526 > 5f 5-2 12.5158971 14.37884817 > 5a NA NA NA > 5c NA NA NA > 5e NA NA NA > 5g NA NA NA > 5h NA NA NA > 5i NA NA NA > 5j NA NA NA > 9h 9-1 NA NA > 9a NA NA NA > 9b NA NA NA > 9c NA NA NA > 9d NA NA NA > 9e NA NA NA > 9f NA NA NA > 9g NA NA NA > 9i NA NA NA > 9j NA NA NA > 8a 8-1 14.29712852 12.83178905 > 8e 8-2 23.46594953 9.097377872 > 8f 8-3 NA NA > 8f 8-4 22.20001584 20.39646766 > 8h 8-5 50.54497551 56.93752065 > 8b NA NA NA > 8c NA NA NA > 8d NA NA NA > 8g NA NA NA > 8i NA NA NA > 8j NA NA NA > 4b 4-1 40.83468857 35.99017683 > 4f 4-3 NA 182.8060799 > 4f 4-4 NA 36.81401955 > 4h 4-2 17.13625062 NA > 4a NA NA NA > 4c NA NA NA > 4d NA NA NA > 4e NA NA NA > 4g NA NA NA > 4i NA NA NA > 4j NA NA NA > 7b 7-1 8.217605633 8.565035083 > 7a NA NA NA > 7c NA NA NA > 7d NA NA NA > 7e NA NA NA > 7f NA NA NA > 7g NA NA NA > 7h NA NA NA > 7i NA NA NA > 7j NA NA NA > 6b 6-6 NA 11.57887288 > 6c 6-1 27.32608984 17.17778959 > 6c 6-2 78.21988783 61.80558768 > 6d 6-7 NA 3.599685625 > 6f 6-3 26.78838281 23.33258286 > 6h 6-4 NA NA > 6h 6-5 NA NA > 6a NA NA NA > 6e NA NA NA > 6g NA NA NA > 6i NA NA NA > 6j NA NA NA > > On Jan 29, 10:43 pm, Prof Brian Ripley wrote: > > On Sat, 29 Jan 2011, Kang Min wrote: > > > Thanks Prof Ripley, the condition worked! > > > Btw I tried to search ?repl but I don't have documentation for it. Is > > > it in a non-basic package? > > > I meantgrepl: the edit messed up (but not on my screen, as sometimes > > happens when working remotely). The point is that 'perl=TRUE' > > guarantees that
Re: [R] NaNs in Nested Mixed Model
Thank you Thierry for your kind answer! If you don't mind I would like to ask a follow-up question. In your suggestions I get P-values for "Species". However, I am really not interested in that factor per se. Would it make sense to use this model instead if I am only interested in "Genotype"? > model<-glmer(Nymphs~Genotype+(1|Species/Genotype),family=poisson) All the best, Johan 2011/3/9 ONKELINX, Thierry : > Dear Johan, > > A few remarks. > > - R-sig-mixed models is a better list for asking questions about mixed model. > - I presume that Nymphs is the number of insects? In that case you need a > generalised linear (mixed) model with poisson family > - What are you interessed in? The variability among genotypes or the effect > of each genotype. > You can achieve the first with a glmm like glmer(Nymphs ~ Species + > (1|Genotype), family = "poisson"). Genotype will be implicitly nested in > Species. Note that since you have only 4 genotypes, you will not get very > reliable estimates of the genotype variance. > For the latter you cannot use a mixed model so you need a simple > glm(Nymphs ~ Species/Genotype, family = "poisson"). Note that several > coefficients will be NaN, because you cannot estimate them. > > Best regards, > > Thierry > > > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek > team Biometrie & Kwaliteitszorg > Gaverstraat 4 > 9500 Geraardsbergen > Belgium > > Research Institute for Nature and Forest > team Biometrics & Quality Assurance > Gaverstraat 4 > 9500 Geraardsbergen > Belgium > > tel. + 32 54/436 185 > thierry.onkel...@inbo.be > www.inbo.be > > To call in the statistician after the experiment is done may be no more than > asking him to perform a post-mortem examination: he may be able to say what > the experiment died of. > ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. > ~ Roger Brinner > > The combination of some data and an aching desire for an answer does not > ensure that a reasonable answer can be extracted from a given body of data. > ~ John Tukey > > >> -Oorspronkelijk bericht- >> Van: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] Namens Johan Stenberg >> Verzonden: dinsdag 8 maart 2011 16:52 >> Aan: r-help@r-project.org >> Onderwerp: [R] NaNs in Nested Mixed Model >> >> Dear R users, >> >> I have a problem with something called "NaNs" in a nested mixed model. >> >> The background is that I have studied the number of insect >> nymphs emerging from replicated Willow genotypes in the >> field. I have 15 replicates each of 4 Willow genotypes >> belonging two 2 Willow species. >> Now I want to elucidate the effect of Willow genotype on the >> number of emerging nymphs. Previously I performed a simple >> one-way anova with "genotype" as explanatory factor and >> "number of nymphs emerging" as dependent variable, but the >> editor of the journal I've submitted this piece to wants me >> to nest Willow genotype within Willow species before he >> accepts the paper for publication [Species*Genotype(Species)]. >> >> The fact that I didn't include "Willow species" as a factor >> in my initial analysis reflects that I am not very interested >> in the species factor per se - I am just interested in if >> genetic variation in the host plant is important, but >> "species" is of course a factor that structures genetic diversity. >> >> I thought the below model would be appropriate: >> >> > model<-lme(Nymphs~Species*Genotype,random=~1|Species/Genotype) >> >> ...but I then get the error message "Error in MEEM(object, conLin, >> control$niterEM) : Singularity in backsolve at level 0, block 1" >> >> I then tried to remove "Genotype" from the fixed factors, but >> then I get the error message "NaNs produced". >> >> > model<-lme(Nymphs~Species,random=~1|Species/Genotype) >> > summary(model) >> Linear mixed-effects model fit by REML >> Data: NULL >> AIC BIC logLik >> 259.5054 269.8077 -124.7527 >> >> Random effects: >> Formula: ~1 | Species >> (Intercept) >> StdDev: 0.9481812 >> >> Formula: ~1 | Genotype %in% Species >> (Intercept) Residual >> StdDev: 0.3486937 1.947526 >> >> Fixed effects: Nymphs ~ Species >> Value Std.Error DF t-value p-value >> (Intercept) 2.67 1.042243 56 2.558585 0.0132 >> Speciesviminalis -2.03 1.473954 0 -1.379510 NaN >> Correlation: >> (Intr) >> Speciesviminalis -0.707 >> >> Standardized Within-Group Residuals: >> Min Q1 Med Q3 Max >> -1.4581821 -0.3892233 -0.2751795 0.3439871 3.1630658 >> >> Number of Observations: 60 >> Number of Groups: >> Species Genotype %in% Species >> 2 4 >> Warning message: >> In pt(q, df, lower.tail, log.p) : NaNs produced >> *** >> >> Do you have any idea what these error messages mean in my >> case and how I can
Re: [R] Specify feature weights in model prediction (CARET)
It was a general question to find out if the caret framework supported passing feature weights to any predictive model. But I guess, it may only make sense in the context of the model. The ones I am interested in would be svm and knn. Thanks, Kendric On Wed, Mar 16, 2011 at 6:20 PM, Max Kuhn wrote: > > Using the 'CARET' package, is it possible to specify weights for features > > used in model prediction? > > For what model? > > > And for the 'knn' implementation, is there a way > > to choose a distance metric (i.e. Mahalanobis distance)? > > > > No, sorry. > > Max > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] flow map lines between point pairs (latitude/longitude)
I am working on a similar problem. I have to add two columns: one containing the US state to which the origin belongs and another one to add the state in to which destination belongs. All I have is the latitude and the longitude of the origin and destination. Are there any packages in R that can do this??? Thank you. Ravi -- View this message in context: http://r.789695.n4.nabble.com/flow-map-lines-between-point-pairs-latitude-longitude-tp860009p3383842.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fetch uneven
Thanks ! You all made my day ! -- View this message in context: http://r.789695.n4.nabble.com/fetch-uneven-tp3381949p3383956.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
Hi Alexx, I don't see any problem in comparing models based on different distributions for the same data using the AIC, as long as they have a different number of parameters and all the constants are included. For example, you can compare distribution mixture models with different number of components using the AIC. This is one example: Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length Frequency Data. Journal of Biological, Agricultural and Environmental Statistics 15:416-429. Here is another example: www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf Prof. Dayton writes above that one advantage of AIC over hypothesis testing is: "(d) Considerations related to underlying distributions for random variables can be incorporated into the decision-making process rather than being treated as an assumption whose robustness must be considered (e.g., models based on normal densities and on log-normal densities can be compared)." Last, if you read Akaike's theorem you will see there is nothing precluding comparing models built on different distributional models. Here it is: " the expected (over the sample space and the space of parameter estimates) maximum log-likelihood of some data on a working model overshoots the expected (over the sample space only) maximum log-likelihood of the data under the true model that generated the data by exactly the number of parameters in the working model." A remarkable result. Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of Alexx Hardt Sent: Wed 3/16/2011 7:42 PM To: r-help@r-project.org Subject: Re: [R] R² for non-linear model Am 16.03.2011 19:34, schrieb Anna Gretschel: > Am 16.03.2011 19:21, schrieb Alexx Hardt: >> And to be on-topic: Anna, as far as I know anova's are only useful to >> compare a submodel (e.g. with one less regressor) to another model. >> > thanks! i don't get it either what they mean by fortune... It's an R-package (and a pdf [1]) with collected quotes from the mailing list. Be careful with the suggestion to use AIC. If you wanted to compare two models using AICs, you need the same distribution (that is, Verteilungsannahme) in both models. To my knowledge, there is no way to "compare" a gaussian model to an exponential one (except common sense), but my knowledge is very limited. [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf -- alexx@alexx-fett:~$ vi .emacs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rJava software
Hello, it seems that rJava tries to detect the path to the Java Virtual Maschine from a registry key which is not installed. I gues that the HLM means HKEY_LOCAL_MASCHINE where you find normally the path to your installed Java. On my Windows System i have for example the path HKEY_LOCAL_MASCHINE\Software\JavaSoft\Java Runtime Environment\1.6.0_24 where you can find the key for the Java installation path. Try to open regedit fromyour Windows console and convince yourself that the path is not present. Maybee a reinstallation of the JRE or JDK helps (for the registry key). Or try to put your Java installation (the location) on the OS system path (open the Windows Shell and type Java -version if a installation is already on the system path), maybee that helps, too. Also, does it matter where java is installed? Yes, it does matter because the OS has to find the executables and native libs of the JRE and rJava has to find them, too because it relies on them. I hope this helps. -- View this message in context: http://r.789695.n4.nabble.com/rJava-software-tp3383366p3383977.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding linear regression data to plot
On 03/17/2011 02:47 AM, derek wrote: I know I can add line to graph with abline(), but I would like to print R-squared, F-test value, Residuals and other statistics from lm() to a graph. I don't know how to access the values from summary(), so that I can use them in a following code or print them in a graph. Hi derek, As Peter has already replied, you can use str() to find out the components of the summary. Try something like this: plot(1:10) library(plotrix) addtable2plot(2,8,coef(result)) where "result" is what has been returned by the call to "lm". Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why doesn't this work ?
The first line of this reply is a definite candidate for the fortunes package! best i --- On Thu, 17/3/11, bill.venab...@csiro.au wrote: > From: bill.venab...@csiro.au > Subject: Re: [R] Why doesn't this work ? > To: ericst...@aol.com, r-help@r-project.org > Date: Thursday, 17 March, 2011, 3:54 > It doesn't work (in R) because it is > not written in R. It's written in some other language > that looks a bit like R. > > > t <- 3 > > z <- t %in% 1:3 > > z > [1] TRUE > > t <- 4 > > z <- t %in% 1:3 > > z > [1] FALSE > > > > > > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] > On Behalf Of eric > Sent: Thursday, 17 March 2011 1:26 PM > To: r-help@r-project.org > Subject: [R] Why doesn't this work ? > > Why doesn't this work and is there a better way ? > > z <-ifelse(t==1 || 2 || 3, 1,0) > t <-3 > z > [1] 1 > t <-4 > z > [1] 1 > > trying to say ...if t == 1 or if t== 2 or if t ==3 then > true, otherwise > false > > -- > View this message in context: > http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculating the occurrences of distinct observations in the subsets of a dataframe
Hello everybody, I have a data frame in R which is similar to the follows. Actually my real 'df' dataframe is much bigger than this one here but I really do not want to confuse anybody so that is why I try to simplify things as much as possible. So here's the data frame. id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3) a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3) b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2) c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2) d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2) e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3) df <-data.frame(id,a,b,c,d,e) df Basically what I would like to do is to get the occurrences of numbers for each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter grouping see my column 'id'). So, for column 'a' and for id number '1' (for the latter see column 'id') the code would be something like this: as.numeric(table(df[1:10,2])) The results are: [1] 3 7 Just to briefly explain my results: in column 'a' (and regarding only those records which have number '1' in column 'id') we can say that: number 1 occured 3 times, and number 3 occured 7 times. Again, just to show you another example. For column 'a' and for id number '2' (for the latter grouping see again column 'id'): as.numeric(table(df[11:20,2])) After running the codes the results are: [1] 4 3 3 Let me explain a little again: in column 'a' and regarding only those observations which have number '2' in column 'id') we can say that number 1 occured 4 times number 2 occured 3 times and number 3 occured 3 times. Last example: for column 'e' and for id number '3' the code would be: as.numeric(table(df[21:30,6])) With the results: [1] 1 4 5 ...meaning that number '1' occured once, number '2' occured four times and number '3' occured 5 times. So this is what I would like to do. Calculating the occurrences of numbers for each custom-defined subsets (and then collecting these values into a data frame). I know it is NOT a difficult task but the PROBLEM is that I'm gonna have to change the input 'df' dataframe on a regular basis and hence both the overall number of rows and columns might CHANGE over time... What I have done so far is that I have separated the 'df' dataframe by columns, like this: for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z]) So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c etc. But I'm really stuck now and I don't know how to move forward, you know, getting the occurrences for each column and each group of ids. Do you have any ideas? Best regards, Laszlo Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben nem Ãn a levél cÃmzettje akkor a levél tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton történÅ terjesztése, felhasználása szigorúan tilos. Amennyiben tévedésbÅl kapta meg ezt az üzenetet kérjük azonnal értesÃtse az üzenet küldÅjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelÅsséget az információ teljes és pontos - cÃmzett(ek)hez történÅ - eljuttatásáért, valamint semmilyen késésért, kapcsolat megszakadásból eredÅ hibáért, vagy az információ felhasználásából vagy annak megbÃzhatatlanságából eredÅ kárért. Az üzenetek EBH-n kÃvüli küldÅje vagy cÃmzettje tudomásul veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos munkamenetének biztosÃtása érdekében. This e-mail and any attached files are confidential and/...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why doesn't this work ?
And, just to make it really clear (I hope!): Your original expression z <-ifelse(t==1 || 2 || 3, 1,0) looks like a transcription into "R" of the words "If t equals 1 or 2 or 3 then z is 1 else z is 0" However, your "t==1 || 2 || 3" has to be parsed in the correct order according to operator precedence. If you look at '?Syntax' you will see that operator '==' has precedence over the "or" operator '||'. Hence the expression "t==1 || 2 || 3" will be parsed as (t==1) || 2 || 3 So, whatever the value of 't' ("t==1" may be either TRUE or FALSE), the result will be either TRUE || 2 || 3 or FALSE || 2 || 3 When numeric values (like "2" or "3") occur in a logical expression, they are coerced to a logical TRUE (with the one exception that a numerical value of "0" is coerced to FALSE). Hence, whatever the outcome of "t==1" the result will be either TRUE || TRUE || TRUE or FALSE || TRUE || TRUE which is always TRUE. This explains your results. Bill and Phil have given you an alternative which works: "t %in% (1:3)". A way of writing it (though longer) which is closer to your original "wording" could be if( (t==1) || (t==2) || (t==3) , 1, 0) which really spells out how to force the parsing. Hoping this helps, Ted. On 17-Mar-11 03:54:32, bill.venab...@csiro.au wrote: > It doesn't work (in R) because it is not written in R. It's written in > some other language that looks a bit like R. > >> t <- 3 >> z <- t %in% 1:3 >> z > [1] TRUE >> t <- 4 >> z <- t %in% 1:3 >> z > [1] FALSE > > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of eric > Sent: Thursday, 17 March 2011 1:26 PM > To: r-help@r-project.org > Subject: [R] Why doesn't this work ? > > Why doesn't this work and is there a better way ? > > z <-ifelse(t==1 || 2 || 3, 1,0) > t <-3 > z > [1] 1 > t <-4 > z > [1] 1 > > trying to say ...if t == 1 or if t== 2 or if t ==3 then true, otherwise > false > > -- > View this message in context: > http://r.789695.n4.nabble.com/Why-doesn-t-this-work-tp3383656p3383656.ht > ml > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 17-Mar-11 Time: 10:13:53 -- XFMail -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create data set from selection of rows
On 15/03/2011, Francisco Gochez wrote: > Hi, > > What you are after is: > > datasubset <- dataset[ dataset[,3] == "text3", ] Thank you. For the set text1,23,text2,45 text1,23,text3,78 text1,23,text3,56 text1,23,text2,45 Is it possible to write a function that selects rows containing 'text3' and applies the function 'sum' to values '78' and '56'? The control statements described in the document 'an introduction to r' (Venables and Smith, 2010) suggest that the if statement would return 'true', to prevent a sum function being applied to 'true' results. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Retrieve an index of nested lists | Changing name delimiter in 'unlist()'
Dear list, I have to problems that are connected: PROBLEM 1 I wonder if it is somehow possible to patch the function 'unlist(use.names=TRUE)' such that you can specify an arbitrary name delimiter, e.g. "/" or "_". As I often name my variables "var.x.y", the default delimiter makes it hard to distinguish the distinct layers of a nested list after unlisting. As 'unlist()' it is called by '.Internal()', I didn't have a clue of how to find the section where the names are pasted. I'd like to use such a patched version in order to create a customized index (class: data.frame) of nested lists which should look like this name index is.top is.bottom degree a1 TRUEFALSE 1 a/a.11-1FALSE FALSE 2 a/a.1/a.1.1 1-1-1 FALSE TRUE3 a/a.21-2FALSE FALSE 2 a/a.2/a.2.1 1-2-1 FALSE TRUE3 b2 TRUE FALSE 1 ... Of course I could get such an index by recursively traversing the list layers, but this takes much too long. Maybe you also have another idea on how to create such an index. PROBLEM 2 I'd also like to retrieve such an index for nested environment structures. Therefore, I would either need something similar to 'unlist()' that works on environments or find something that coerces nested environments to nested lists. Again, I solved this by recursively looping over the respective environment layers, but it's too inefficient. Is there something out there that does the job via a C routine somehow? Thanks for any suggestions, Janko __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating the occurrences of distinct observations in the subsets of a dataframe
Hi! Sorry, I made an error in the previous e-mail. So try this: by(df[,-1],df$id,function(x) apply(x,2,tabulate)) This gives you a list. You can rearrange it into a data frame or a 3d array if you wish. Regards, Denes > Hello everybody, > > I have a data frame in R which is similar to the follows. Actually my real > 'df' dataframe is much bigger than this one here but I really do not want > to confuse anybody so that is why I try to simplify things as much as > possible. > > So here's the data frame. > > id <-c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3) > a <-c(3,1,3,3,1,3,3,3,3,1,3,2,1,2,1,3,3,2,1,1,1,3,1,3,3,3,2,1,1,3) > b <-c(3,2,1,1,1,1,1,1,1,1,1,2,1,3,2,1,1,1,2,1,3,1,2,2,1,3,3,2,3,2) > c <-c(1,3,2,3,2,1,2,3,3,2,2,3,1,2,3,3,3,1,1,2,3,3,1,2,2,3,2,2,3,2) > d <-c(3,3,3,1,3,2,2,1,2,3,2,2,2,1,3,1,2,2,3,2,3,2,3,2,1,1,1,1,1,2) > e <-c(2,3,1,2,1,2,3,3,1,1,2,1,1,3,3,2,1,1,3,3,2,2,3,3,3,2,3,2,1,3) > df <-data.frame(id,a,b,c,d,e) > df > > Basically what I would like to do is to get the occurrences of numbers for > each column (a,b,c,d,e) and for each id group (1,2,3) (for this latter > grouping see my column 'id'). > > So, for column 'a' and for id number '1' (for the latter see column 'id') > the code would be something like this: > as.numeric(table(df[1:10,2])) > > The results are: > [1] 3 7 > > Just to briefly explain my results: in column 'a' (and regarding only > those records which have number '1' in column 'id') we can say that: > number 1 occured 3 times, and > number 3 occured 7 times. > > Again, just to show you another example. For column 'a' and for id number > '2' (for the latter grouping see again column 'id'): > as.numeric(table(df[11:20,2])) > > After running the codes the results are: [1] 4 3 3 > > Let me explain a little again: in column 'a' and regarding only those > observations which have number '2' in column 'id') we can say that > number 1 occured 4 times > number 2 occured 3 times and > number 3 occured 3 times. > > Last example: for column 'e' and for id number '3' the code would be: > as.numeric(table(df[21:30,6])) > > With the results: > [1] 1 4 5 > > ...meaning that number '1' occured once, number '2' occured four times and > number '3' occured 5 times. > > So this is what I would like to do. Calculating the occurrences of numbers > for each custom-defined subsets (and then collecting these values into a > data frame). I know it is NOT a difficult task but the PROBLEM is that I'm > gonna have to change the input 'df' dataframe on a regular basis and hence > both the overall number of rows and columns might CHANGE over time... > > What I have done so far is that I have separated the 'df' dataframe by > columns, like this: > for (z in (2:ncol(df))) assign(paste("df",z,sep="."),df[,z]) > > So df.2 will refer to df$a, df.3 will equal df$b, df.4 will equal df$c > etc. But I'm really stuck now and I don't know how to move forward, you > know, getting the occurrences for each column and each group of ids. > > Do you have any ideas? > Best regards, > Laszlo > > > Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos > és/vagy jogilag, szakmailag vagy más módon védett információt > tartalmazhat. Amennyiben nem Ãn a levél cÃmzettje akkor a levél > tartalmának közlése, reprodukálása, másolása, vagy egyéb más > úton történÅ terjesztése, felhasználása szigorúan tilos. > Amennyiben tévedésbÅl kapta meg ezt az üzenetet kérjük azonnal > értesÃtse az üzenet küldÅjét. Az Erste Bank Hungary Zrt. (EBH) nem > vállal felelÅsséget az információ teljes és pontos - cÃmzett(ek)hez > történÅ - eljuttatásáért, valamint semmilyen késésért, kapcsolat > megszakadásból eredÅ hibáért, vagy az információ > felhasználásából vagy annak megbÃzhatatlanságából eredÅ > kárért. > > Az üzenetek EBH-n kÃvüli küldÅje vagy cÃmzettje tudomásul veszi és > hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet > az EBH folytonos munkamenetének biztosÃtása érdekében. > > > This e-mail and any attached files are confidential and/...{{dropped:19}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] changing the dimensions of a matrix in a real specific way
Hi again, I'd like to ask you a question again. I have a matrix like this: a <-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12)) a [,1] [1,]1 [2,]2 [3,]3 [4,]4 [5,]5 [6,]6 [7,]7 [8,]8 [9,]9 [10,] 10 [11,] 11 [12,] 12 Is there a proper way to change the dimensions of this matrix so that I'll get this as a result: [,1] [,2] [,3] [1,] 123 [2,] 456 [3,] 789 [4,] 10 11 12 Thank you very much and have a pleasant day, Laszlo Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos és/vagy jogilag, szakmailag vagy más módon védett információt tartalmazhat. Amennyiben nem Ãn a levél cÃmzettje akkor a levél tartalmának közlése, reprodukálása, másolása, vagy egyéb más úton történÅ terjesztése, felhasználása szigorúan tilos. Amennyiben tévedésbÅl kapta meg ezt az üzenetet kérjük azonnal értesÃtse az üzenet küldÅjét. Az Erste Bank Hungary Zrt. (EBH) nem vállal felelÅsséget az információ teljes és pontos - cÃmzett(ek)hez történÅ - eljuttatásáért, valamint semmilyen késésért, kapcsolat megszakadásból eredÅ hibáért, vagy az információ felhasználásából vagy annak megbÃzhatatlanságából eredÅ kárért. Az üzenetek EBH-n kÃvüli küldÅje vagy cÃmzettje tudomásul veszi és hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet az EBH folytonos munkamenetének biztosÃtása érdekében. This e-mail and any attached files are confidential and/...{{dropped:19}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create data set from selection of rows
Is this what you want: > x V1 V2V3 V4 1 text1 23 text2 45 2 text1 23 text3 78 3 text1 23 text3 56 4 text1 23 text2 45 > str(x) 'data.frame': 4 obs. of 4 variables: $ V1: Factor w/ 1 level "text1": 1 1 1 1 $ V2: int 23 23 23 23 $ V3: Factor w/ 2 levels "text2","text3": 1 2 2 1 $ V4: int 45 78 56 45 > tapply(x$V4, x$V3, sum) text2 text3 90 134 > On Thu, Mar 17, 2011 at 6:50 AM, e-letter wrote: > On 15/03/2011, Francisco Gochez wrote: >> Hi, >> >> What you are after is: >> >> datasubset <- dataset[ dataset[,3] == "text3", ] > > Thank you. For the set > > text1,23,text2,45 > text1,23,text3,78 > text1,23,text3,56 > text1,23,text2,45 > > Is it possible to write a function that selects rows containing > 'text3' and applies the function 'sum' to values '78' and '56'? The > control statements described in the document 'an introduction to r' > (Venables and Smith, 2010) suggest that the if statement would return > 'true', to prevent a sum function being applied to 'true' results. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create data set from selection of rows
Using the summarise function in package plyr is one way; taking df to be your data frame with variable names V1-V4, library(plyr) summarise(subset(df, V3 == 'text3'), sum = sum(V4)) sum 1 134 Another is to use the data.table package: library(data.table) dt <- data.table(df) dt[V3 == 'text3', sum(V4)] [1] 134 A third route would be with package sqldf, something like sqldf(df, "select sum(V4) from df where V3 == 'text3'") but sqldf is messed up on my system (probably because I didn't install it properly) so the code is untested. HTH, Dennis On Thu, Mar 17, 2011 at 3:50 AM, e-letter wrote: > On 15/03/2011, Francisco Gochez wrote: > > Hi, > > > > What you are after is: > > > > datasubset <- dataset[ dataset[,3] == "text3", ] > > Thank you. For the set > > text1,23,text2,45 > text1,23,text3,78 > text1,23,text3,56 > text1,23,text2,45 > > Is it possible to write a function that selects rows containing > 'text3' and applies the function 'sum' to values '78' and '56'? The > control statements described in the document 'an introduction to r' > (Venables and Smith, 2010) suggest that the if statement would return > 'true', to prevent a sum function being applied to 'true' results. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting columns from a class
Hi list, I am not a frequent user of R. Recently I used R in principal component analysis and got the result as a class, which has information like standard deviation and principal components from 1 to 10. How is it possible to extract the column corresponding to first principal component and write it to a file the out from prcomp command is something like this Standard deviations: [1] 3.325801e+00 7.669837e-01 6.625773e-01 4.990732e-01 3.470071e-01 [6] 2.946679e-01 2.206289e-01 1.645828e-01 1.570887e-01 4.741294e-16 Rotation: PC1 PC2 PC3 PC4 PC5 [1,] -0.07900624 -0.0824864352 0.1208419434 0.1763425845 0.089545020 [2,] -0.09114708 -0.0901675110 0.1377608881 0.2224127252 0.076620976 [3,] -0.10510742 -0.0935434206 0.1113586044 0.2513993555 0.029783117 I want to extract PC1 and 1 value in the standard deviation Thanks -- Nuncio.M Research Scientist National Center for Antarctic and Ocean research Head land Sada Vasco da Gamma Goa-403804 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assigning to list element within target environment
I would like to assign an value to an element of a list contained in an environment. The list will contain vectors and matrices. Here's a simple example: # create toy environment testEnv = new.env(parent = emptyenv()) # create list that will be in the environment, then assign() it x = list(a=1,b=2) assign("xList",x,testEnv) # create new element, to be inserted into xList c = 5:7 Now, what I'd like to do is something like this: assign("xList[[3]]",c,testEnv) But the assign() help says: "assign does not dispatch assignment methods, so it cannot be used to set elements of vectors, names, attributes, etc." So that's no good. Also, this fails: attach(testEnv) xList[[3]] <<- c detach(testEnv) because, as the attach() help says: "If you use <<- or assign to assign to an attached database, you only alter the attached copy, not the original object." What I've been doing is making a copy of xList in the local environment, changing it, then using assign() to make the change in the testEnv environment. This is unsatisfactory due to the double use of memory making a copy of testEnv (the actual list I will use will be quite large). Another option would be this: # Assign the new element to a variable in the target environment assign("temp",c,testEnv) # "Assign" it within the target environment, from the temp variable eval(parse(text="xList[[3]]=temp"),env=testEnv) # remove the unneeded variable rm("temp",envir=testEnv) But I figure there must be a more elegant way. Anyone have any ideas? Thanks, Richard -- Richard D. Morey Assistant Professor Psychometrics and Statistics Rijksuniversiteit Groningen / University of Groningen http://drsmorey.org/research/rdmorey __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing the dimensions of a matrix in a real specific way
t(matrix(a,3,4)) for more complex arrays, see ?aperm > Hi again, > > I'd like to ask you a question again. > > I have a matrix like this: > a <-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12)) > a > > [,1] > [1,]1 > [2,]2 > [3,]3 > [4,]4 > [5,]5 > [6,]6 > [7,]7 > [8,]8 > [9,]9 > [10,] 10 > [11,] 11 > [12,] 12 > > Is there a proper way to change the dimensions of this matrix so that I'll > get this as a result: > [,1] [,2] [,3] > [1,] 123 > [2,] 456 > [3,] 789 > [4,] 10 11 12 > > Thank you very much and have a pleasant day, > Laszlo > > > Ez az e-mail és az összes hozzá tartozó csatolt melléklet titkos > és/vagy jogilag, szakmailag vagy más módon védett információt > tartalmazhat. Amennyiben nem Ãn a levél cÃmzettje akkor a levél > tartalmának közlése, reprodukálása, másolása, vagy egyéb más > úton történÅ terjesztése, felhasználása szigorúan tilos. > Amennyiben tévedésbÅl kapta meg ezt az üzenetet kérjük azonnal > értesÃtse az üzenet küldÅjét. Az Erste Bank Hungary Zrt. (EBH) nem > vállal felelÅsséget az információ teljes és pontos - cÃmzett(ek)hez > történÅ - eljuttatásáért, valamint semmilyen késésért, kapcsolat > megszakadásból eredÅ hibáért, vagy az információ > felhasználásából vagy annak megbÃzhatatlanságából eredÅ > kárért. > > Az üzenetek EBH-n kÃvüli küldÅje vagy cÃmzettje tudomásul veszi és > hozzájárul, hogy az üzenetekhez más banki alkalmazott is hozzáférhet > az EBH folytonos munkamenetének biztosÃtása érdekében. > > > This e-mail and any attached files are confidential and/...{{dropped:19}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Flexible rbind
Hi Santosh, May be you looking at something like this merge(d,dNewTests,by="type",all=TRUE) On Thu, Mar 17, 2011 at 1:03 PM, Santosh Srinivas < santosh.srini...@gmail.com> wrote: > Dear All, > > I am trying to create a empty structure that I want to fill gradually > through the code. > I want to use something like rbind to create the basic structure first. > > I am looking for a possibility to do an rbind where the columns names > dont match fully (but the missing columns can be defaulted to either > zero or n/a) (my actual data has a lot of columns). > Please see the data frames below > > I have the following data frame that is filled > > dput(d) > structure(list(type = structure(1:9, .Label = c("test1", "test2", > "test3", "test4", "test5", "test6", "test7", "test8", "test9" > ), class = "factor"), meter = c(37.25122438, 58.65845732, 63.47108421, > 94.76085162, 13.75013867, 8.520664878, 79.74623167, 62.16109819, > 20.47806657), degree = c(2.884440333, 74.94067898, 32.64251152, > 83.24820274, 58.36084794, 12.64490368, 4.428796741, 32.12824213, > 97.83710088)), .Names = c("type", "meter", "degree"), class = > "data.frame", row.names = c(NA, > -9L)) > > To the above data I want to append the following data to create an > empty structure for the new data > > dput(dNewTests) > structure(list(type = structure(1:9, .Label = c("test14", "test15", > "test16", "test17", "test18", "test19", "test20", "test21", "test22" > ), class = "factor")), .Names = "type", class = "data.frame", row.names = > c(NA, > -9L)) > > rbind obviously throws the following error: > > rbind(d, dNewTests) > Error in rbind(deparse.level, ...) : > numbers of columns of arguments do not match > > Any way that I can default the missing columns to NA or some default > value like zero? > > Also, I tried to go through the archives and tried merge_all. What am > I doing wrong wih the code below ? > > require(reshape) > merge_all(d, dNewTests,by="type") > > Error in fix.by(by.x, x) : > 'by' must specify column(s) as numbers, names or logical > > > Thanks, > Santosh > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] assigning to list element within target environment
On Thu, Mar 17, 2011 at 7:25 AM, Richard D. Morey wrote: > I would like to assign an value to an element of a list contained in an > environment. The list will contain vectors and matrices. Here's a simple > example: > > # create toy environment > testEnv = new.env(parent = emptyenv()) > > # create list that will be in the environment, then assign() it > x = list(a=1,b=2) > assign("xList",x,testEnv) > > # create new element, to be inserted into xList > c = 5:7 > > Now, what I'd like to do is something like this: > > assign("xList[[3]]",c,testEnv) testEnv$x[[3]] <- c ? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why doesn't this work ?
On Thu, Mar 17, 2011 at 3:54 AM, wrote: > It doesn't work (in R) because it is not written in R. It's written in some > other language that looks a bit like R. It parses in R, so I would say it was written in R. To paraphrase Obi-wan, it's just not the R you are looking for. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing the dimensions of a matrix in a real specific way
matrix(a, ncol=3, nrow=4, byrow=TRUE) or > dim(a) <- c(3,4) > a <- t(a) > a [,1] [,2] [,3] [1,]123 [2,]456 [3,]789 [4,] 10 11 12 Depending on the context of the problem. Sarah On Thu, Mar 17, 2011 at 7:59 AM, Bodnar Laszlo EB_HU wrote: > Hi again, > > I'd like to ask you a question again. > > I have a matrix like this: > a <-matrix(c(1,2,3,4,5,6,7,8,9,10,11,12)) > a > > [,1] > [1,] 1 > [2,] 2 > [3,] 3 > [4,] 4 > [5,] 5 > [6,] 6 > [7,] 7 > [8,] 8 > [9,] 9 > [10,] 10 > [11,] 11 > [12,] 12 > > Is there a proper way to change the dimensions of this matrix so that I'll > get this as a result: > [,1] [,2] [,3] > [1,] 1 2 3 > [2,] 4 5 6 > [3,] 7 8 9 > [4,] 10 11 12 > > Thank you very much and have a pleasant day, > Laszlo > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Incorrect degrees of freedom in SEM model using lavaan
I have been trying to use lavaan (version 0.4-7) for a simple path model, but the program seems to be computing far less degrees of freedom for my model then it should have. I have 7 variables, which should give (7)(8)/2 = 28 covariances, and hence 28 DF. The model seems to only think I have 13 DF. The code to reproduce the problem is below. Have I done something wrong, or is this something I should take to the developer? library(lavaan) myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, - .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02, .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01 , .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3", "HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3", "HHincome", "Race", "Age", "Gender", "Stress1"))) model = ' Internet ~ HHincome + Race + Age + Gender + Stress1 Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1 ' fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE) #check the number of parameters being estimated inspect(fit, what="free") #Note the DF for the Chi-square is 0, when it should be 28-13 = 15 summary(fit) Andrew Miles [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Autocorrelation in linear models
Arni Magnusson hafro.is> writes: > > I have been reading about autocorrelation in linear models over the last > couple of days, and I have to say the more I read, the more confused I > get. Beyond confusion lies enlightenment, so I'm tempted to ask R-Help for > guidance. > > Most authors are mainly worried about autocorrelation in the residuals, > but some authors are also worried about autocorrelation within Y and > within X vectors before any model is fitted. Would you test for > autocorrelation both in the data and in the residuals? My immediate reaction is that autocorrelation in the raw data (marginal autocorrelation) is not relevant. (There are exceptions, of course -- in many ecological systems the marginal autocorrelation tells us something about the processes driving the system, so we may want to quantify/estimate it -- but I wouldn't generally think that *testing* it (e.g. trying to reject a null hypothesis of ACF=0) makes sense.) > > If we limit our worries to the residuals, it looks like we have a variety > of tests for lag=1: > >stats::cor.test(residuals(fm)[-n], residuals(fm)[-1]) >stats::Box.test(residuals(fm)) >lmtest::dwtest(fm, alternative="two.sided") >lmtest::bgtest(fm, type="F") Note that (I think) all of these tests are based on lag-1 autocorrelation only (I see you mention this later). Have you looked at nlme:::ACF ? It is possible to get non-significant autocorrelation at lag 1 with sig. autocorrelation at higher lags. > > In my model, a simple lm(y~x1+x2) with n=20 annual measurements, I have > significant _positive_ autocorrelation within Y and within both X vectors, > but _negative_ autocorrelation in the residuals. That's plausible. Again, I think the residual autocorrelation is what you should worry about. > The residual > autocorrelation is not quite significant, with the p-values > >0.070 >0.064 >0.125 >0.077 > > from the tests above. I seem to remember some authors saying that the > Durbin-Watson test has less power than some alternative tests, as > reflected here. The difference in p-values is substantial, ?? I wouldn't necessarily say so -- I would guess you could get this range of p-values from a single test statistic if you had multiple simulated data sets from the same underlying model and parameters ... Have you tried running such simulations? > so choosing > which test to use could in many cases make a big difference for the > subsequent analysis and conclusions. Most of them (cor.test, Box.test, > bgtest) can also test lags>1. Which test would you recommend? I imagine > the basic cor.test is somehow inappropriate for this; the other tests > wouldn't have been invented otherwise, right? I don't know the details (it's been a while since I did time series analysis, and it wasn't in this particular vein.) > The car::dwt(fm) has p-values fluctuating by a factor of 2, unless I run a > very long simulation, which results in a p-value similar to > lmtest::dwtest, at least in my case. > > Finally, one question regarding remedies. If there was significant > _positive_ autocorrelation in the residuals, some authors suggest > remedying this by deflating the df (fewer effective df in the data) and > redo the t-tests of the regression coefficients, rejecting fewer null > hypotheses. Does that mean if the residuals are _negatively_ correlated > then I should inflate the df (more effective df in the data) and reject > more null hypotheses? My personal taste is that these df adjustments are bit cheesy. Most of the time I would prefer to fit a model that incorporated autocorrelation (i.e. nlme::gls(y~x1+x2,correlation=corAR1()) [or pick another choice of time-series model from ?corClasses]. More generally, this whole approach falls into the category of "test for presence of XX; if XX is not statistically significant then ignore it", which is worrisome (if your test for XX is very powerful then you will be concerned about dealing with XX even when its effect on your results would be trivial; if your test for XX is weak or you have very little data then you won't detect XX even when it is present). I would say that if you're really concerned about autocorrelation you should just automatically use a modeling approach (see above) that incorporates it. > > That's four question marks. I'd greatly appreciate guidance on any of > them. > > Thanks in advance, > cheers Ben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in lattice auto.key argument
On Wed, Mar 16, 2011 at 12:52 AM, wrote: > The Lattice auto.key argument has a bug in R.12.2. > > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > other attached packages: > [1] lattice_0.19-17 > > loaded via a namespace (and not attached): > [1] grid_2.12.2 > > If I set up my plot parameters as > > require(lattice) > superpose.line.settings <- trellis.par.get("superpose.line") > str(superpose.line.settings) > superpose.line.settings$col <- 1 > superpose.line.settings$lty <- c(1,2,5) * > superpose.line.settings$lwd <- 2 > trellis.par.set("superpose.line",superpose.line.settings) > > and then set up my key list as > > my.key <- list(space="top", points=FALSE, lines=TRUE, columns=3) > > and then run xyplot with the argument auto.key=my.key, > I do NOT get the proper legend at the top of the plot. > Instead of the lines in the legend having the characteristics of > lty=c(1,2,5), they have the characteristics of lwd=c(1,2,5). I think Felix has fixed this bug a couple of days ago. Please check the r-forge version from https://r-forge.r-project.org/R/?group_id=638 There should be a new release sometime next week. -Deepayan > > The auto.key argument works fine in R.12.1 > > R version 2.12.1 (2010-12-16) > Platform: i386-pc-mingw32/i386 (32-bit) > . > other attached packages: > [1] lattice_0.19-13 > > loaded via a namespace (and not attached): > [1] grid_2.12.1 > > > Joe > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] setting up a R working group
Dear R users, we are currently trying to set up a R working group at Swansea University. I would be very grateful to get some information and feedback from R users that have done something similar, in particular regarding: - help with giving a general overview of what R is capable of - help with what to present or teach (did you organise your own training programme or did you get someone from outside?) - are there people out there who would like to present their work to us? Any help will be highly appreciated! Thanks Joanne -- Joanne Demmler Ph.D. Research Assistant College of Medicine Swansea University Singleton Park Swansea SA2 8PP UK tel:+44 (0)1792 295674 fax:+44 (0)1792 513430 email: j.demm...@swansea.ac.uk DECIPHer: www.decipher.uk.net __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using barplot() with zoo -- names.arg not permitted?
I've used barplot(), including the anmes.arg parameter, on data frames successfully, but I'm even newer to using zoo than I am to R. :-} I am working on a functon that accepts a data frame ("df") as its primary argument, extracts information from it to create a zoo, then generates a plot based on that. The data frame has a column called "time" which is a standard UNIX time -- the number of seconds since 01 Jan 1970 (UTC), so I treat that as a POSIXct. (The types of the other columns in the data frame varies greatly, and the sequence in which they occur is not expected to be constant. Thus, I find it easier to let R make a reasonable guess as to the data type, and override special cases -- such as "time" -- on an individual basis.) The data frame contains certain columns whose values are counters, so I generate a zoo of the diff(): oid = "kern.cp_time_" pat = paste("^", oid, sep = "") class(df$time) <- "POSIXct" df_d <- diff(zoo(df[, grep(pat, names(df))], order.by = df$time)) I then save the start & end timestamps for future reference, and generate another zoo, this one containing only the percentages from df_d: cpu_states <- c("sys", "intr", "user", "nice", "idle") d_range <- range(index(df_d)) df_pct <- sweep(df_d, 1, apply(df_d, 1, sum), "/")[, paste(oid, cpu_states[1:4], sep = "")] * 100 Well, that is, I save the percentages in which I'm interested. So far, so good. df_pct is a zoo (as expected, and the content looks reasonable. I then plot it, e.g.: barplot(df_pct, border = NA, col = colvec, space = 0, main = title, sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time") But the X-axis labels show up as "large integers" -- the POSIXct values are apparently treated as numeric quantities for this purpose. And if I try barplot(df_pct, border = NA, col = colvec, space = 0, main = title, sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time", names.arg = index(df_pct)) I am told: Warning messages: 1: In plot.window(xlim, ylim, log = log, ...) : "names" is not a graphical parameter 2: In axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty = axis.lty, : "names" is not a graphical parameter 3: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) : "names" is not a graphical parameter 4: In axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...) : "names" is not a graphical parameter which I find a bit perplexing. If I plot via: barplot(df_pct, border = NA, col = col_vec, space = 0, main = title, sub = sub, ylab = "CPU %", ylim = c(0, 100), x lab = "Time", axisnames = FALSE) that works OK, but then I'm unable to generate a human-readable set of X-axis labels: At best, I'm able to get a single label at the origin. Here's an excerpt from the generated df_d: Browse[2]> df_d[1:10] kern.cp_time_idle kern.cp_time_intr kern.cp_time_nice 1297278017 275 0 0 1297278018 231 0 0 1297278019 266 1 0 1297278020 230 2 0 1297278021 191 0 0 1297278022 114 0 0 129727802330 0 0 1297278024 0 0 0 1297278025 0 0 0 1297278026 0 0 0 kern.cp_time_sys kern.cp_time_user 12972780178 1 1297278018 1735 1297278019 10 6 1297278020 2824 1297278021 2172 1297278022 27 147 1297278023 43 219 1297278024 47 249 1297278025 41 259 1297278026815 Here's a excerpt from the generated df_pct: Browse[2]> df_pct[1:10] kern.cp_time_sys kern.cp_time_intr kern.cp_time_user 1297278017 2.816901 0.000 0.3521127 1297278018 6.007067 0.00012.3674912 1297278019 3.533569 0.3533569 2.1201413 1297278020 9.859155 0.7042254 8.4507042 1297278021 7.394366 0.00025.3521127 1297278022 9.375000 0.00051.0416667 129727802314.726027 0.00075.000 129727802415.878378 0.00084.1216216 129727802513.67 0.00086.333 129727802634.782609 0.00065.2173913 kern.cp_time_nice 1297278017 0 1297278018 0 1297278019 0 1297278020 0 1297278021 0 1297278022
Re: [R] Help- Fitting a Thin Plate Spline
library(mgcv) b <- gam(arcsine.success ~ s(date.num,clutch.size,k=50)) vis.gam(b,theta=30) will fit a thin plate spline and plot it. k is an upper limit on the number of degrees of freedom for the TPS, but the actual degrees of freedom are chosen automatically (use the 'method' argument of gam to choose exactly how). Increase k if 50 is overly restrictive here. gam will also let you fit binary/binomial data directly, so you could probably do away with the arcsine transformation (if I'm guessing right about what you are doing here). There are also several other libraries that will let you fit thin plate splines: gss, for example. Simon On 14/03/11 09:55, kehoe wrote: Hi Everyone, I'm a pretty useless r-er but have data that SPSS etc doesn't like. I've managed to do GLMs for my data, but now need to fit a thin plate spline for my data (arcsine.success~date.num:clutch.size) If anyone has a bit of spare time and could come up with a bit of code I'd be very grateful- I just don't get R language! Thanks Rach -- View this message in context: http://r.789695.n4.nabble.com/Help-Fitting-a-Thin-Plate-Spline-tp3353485p3353485.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- --- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make sure R's g++ compiler uses certain C++ flags when making a package
> Date: Wed, 16 Mar 2011 13:50:37 -0700 > From: solomon.mess...@gmail.com > To: r-help@r-project.org > Subject: Re: [R] How to make sure R's g++ compiler uses certain C++ flags > when making a package > > Looks like the problem may be that R is automatically passing this flag to > the g++ compiler: "-arch x86_64" which appears to be causing trouble for > opencv. Does anyone know how to suppress this flag? Are you builing a 32 or 64 bit app? I haven't looked but offhand that may be what this flag is for ( you can man g++ and check I suppose ) I was going to reply after doing some checking as I have never actually made a complete R package but I did make a test one and verified I could call the c++ code from R. I seem to remember that it was easy to just invoke g++ explicitly and R would use whatever binaries you happen to have left there. That is, if you just compile the code for your R modules yourself there is no real reason to worry about how R does it. Mixing some flags will prevent linking or, worse, create mysterious run time errors. Also, IIRC, your parameters were the output of foreign scripts ( opencv apparently) and not too helpful here ( and in any case you should echo these to see what they are if they are an issue). > > -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmm WITHOUT random factor (lme4)
On Mar 17, 2011; 11:43am Baugh wrote: >> Question: can I simply substitute a dummy var (e.g. populated by zeros) >> for "ID" to run the model >> without the random factor? when I try this R returns values that seem >> reasonable, but I want to be sure >> this is appropriate. If you can fit the model using lme (and it looks like you easily can) then another check would be: ## Compare models with and without random effects fm <- lm(Reaction ~ Days, sleepstudy) fm1 <- lme(Reaction ~ Days, random= ~1|Subject, sleepstudy) anova(fm1, fm) ## lme-fitted model must come first Regards, Mark. -- View this message in context: http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384072.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
Exuse me, I don't claim R^2 can't be negative. What I say if I get R^2 negative then the data are useless. I know, that what Thomas said is true in general case. But in my special case of data, using nonzero intercept is nonsense, and to get R^2 less than 0.985 is considered poor job (standard R^2>0.995). (R^2 given by R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2) ) Because lm() uses two differrent formulas for computing R^2, it is confusing to get R^2 closer to 1 when linear model with zero intercept y=a*x (a = slope) is used, rather than in case with model y=a*x+b (a=slope, b= nonzero intercept). I think R^2 is only measure of good fit for least squares optimization and it doesn't matter which formula is used: (R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2) or R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)), but using both is confusing. So I would like to know why two different formulas for R^2 are used? -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3383992.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmm WITHOUT random factor (lme4)
LMM without Random effect: I want to run an LMM both with and without the random factor (ID). And then extract the log-lik values from the two models in order to generate a p-value. with random factor as:lmer(y~x+(1|ID),data) Question: can I simply substitute a dummy var (e.g. populated by zeros) for "ID" to run the model without the random factor? when I try this R returns values that seem reasonable, but I want to be sure this is appropriate. >From other inquires in the forum it seems that substituting ("ID-1") or some variant like that, does not work. Thanks for your thoughts. Baugh -- View this message in context: http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384054.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
> 2) I don't want to fit data with linear model of zero intercept. > 3) I dont know if I understand correctly. Im 100% sure the model for my data > should have zero intercept. > The only coordinate which Im 100% sure is correct. If I had measured quality > Y of a same sample X0 number of times I would get E(Y(X0))=0. Are points 2) and 3) not contradictory? Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cv.lm syntax error
I rewrite my previous comand, CVlm works now. It was related to having the same names in the df than in the formula included at CVlm . Now I'd like to know if the seed is of any special importance, or I can type just seed=29 like in the example. Thanks in advance, u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3384198.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Segmentation fault when using "plot" function
I frequently get a segmentation fault error when using the "plot" command. It happens about half the time. We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux. I did a quick search for this problem and didn't find anything. Apologies if I missed it. *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0x231e18f22 [ 0] /lib64/libc.so.6 [0x3c84a300a0] [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090] [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101] [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6] [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0] [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674] [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768] [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59] [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581] [ 9] /usr/lib64/R/modules//R_X11.so(X11DeviceDriver+0x19a) [0x2aaab8a336aa] [10] /usr/lib64/R/modules//R_X11.so [0x2aaab8a33c30] [11] /usr/lib64/R/lib/libR.so [0x3659cff75e] [12] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] [13] /usr/lib64/R/lib/libR.so [0x3659ccd6a2] [14] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] [15] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41] [16] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab] [17] /usr/lib64/R/lib/libR.so(GEcurrentDevice+0x158) [0x3659c95a88] [18] /usr/lib64/R/lib/libR.so [0x3659d25c99] [19] /usr/lib64/R/lib/libR.so [0x3659cff75e] [20] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] [21] /usr/lib64/R/lib/libR.so [0x3659ccd6a2] [22] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] [23] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41] [24] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab] [25] /usr/lib64/R/lib/libR.so [0x3659ccd6a2] [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] [27] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41] [28] /usr/lib64/R/lib/libR.so(Rf_usemethod+0x3fa) [0x3659d019da] [29] /usr/lib64/R/lib/libR.so [0x3659d0203d] *** End of error message *** Process R segmentation fault at Thu Mar 17 09:33:59 2011 -- View this message in context: http://r.789695.n4.nabble.com/Segmentation-fault-when-using-plot-function-tp3384603p3384603.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setting up a R working group
On Thu, Mar 17, 2011 at 2:42 PM, Joanne Demmler wrote: > Dear R users, Hi > > we are currently trying to set up a R working group at Swansea University. > I would be very grateful to get some information and feedback from R users > that have done something similar, in particular regarding: We started one,but it fell apart after I left the University - first lesson: you need a critical mass and at least two people who are prepared to drive it. > > - help with giving a general overview of what R is capable of > - help with what to present or teach (did you organise your own training > programme or did you get someone from outside?) We managed with organising it internally, but there were topics, for which external lecturers were very useful - it depends what expertise your R userbase has. It is always better to do the training internally, as then there is always the possibility to go back to the person who taught the course. > - are there people out there who would like to present their work to us? I guess that depends if you are able / willing to at least are prepared to provide travel expenses and what you expect? > > Any help will be highly appreciated! Welcome - and please fel free to ask more detailed questions. By the way - there is also a sig R-teach which might also be of help. Cheers, Rainer > Thanks Joanne > > -- > Joanne Demmler Ph.D. > Research Assistant > College of Medicine > Swansea University > Singleton Park > Swansea SA2 8PP > UK > tel: +44 (0)1792 295674 > fax: +44 (0)1792 513430 > email: j.demm...@swansea.ac.uk > DECIPHer: www.decipher.uk.net > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- NEW GERMAN FAX NUMBER!!! Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Cell: +27 - (0)83 9479 042 Fax: +27 - (0)86 516 2782 Fax: +49 - (0)321 2125 2244 email: rai...@krugs.de Skype: RMkrug Google: r.m.k...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using barplot() with zoo -- names.arg not permitted?
On Thu, Mar 17, 2011 at 9:38 AM, David Wolfskill wrote: > I've used barplot(), including the anmes.arg parameter, on data frames > successfully, but I'm even newer to using zoo than I am to R. :-} > > I am working on a functon that accepts a data frame ("df") as its > primary argument, extracts information from it to create a zoo, then > generates a plot based on that. > > The data frame has a column called "time" which is a standard UNIX time > -- the number of seconds since 01 Jan 1970 (UTC), so I treat that as a > POSIXct. (The types of the other columns in the data frame varies > greatly, and the sequence in which they occur is not expected to be > constant. Thus, I find it easier to let R make a reasonable guess as to > the data type, and override special cases -- such as "time" -- on an > individual basis.) > > The data frame contains certain columns whose values are counters, so I > generate a zoo of the diff(): > > oid = "kern.cp_time_" > pat = paste("^", oid, sep = "") > class(df$time) <- "POSIXct" > df_d <- diff(zoo(df[, grep(pat, names(df))], order.by = df$time)) > > I then save the start & end timestamps for future reference, and > generate another zoo, this one containing only the percentages from > df_d: > > cpu_states <- c("sys", "intr", "user", "nice", "idle") > d_range <- range(index(df_d)) > df_pct <- sweep(df_d, 1, apply(df_d, 1, sum), "/")[, paste(oid, > cpu_states[1:4], sep = "")] * 100 > > Well, that is, I save the percentages in which I'm interested. > > So far, so good. df_pct is a zoo (as expected, and the content looks > reasonable. > > I then plot it, e.g.: > > barplot(df_pct, border = NA, col = colvec, space = 0, main = title, > sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time") > > But the X-axis labels show up as "large integers" -- the POSIXct values > are apparently treated as numeric quantities for this purpose. > > And if I try > > barplot(df_pct, border = NA, col = colvec, space = 0, main = title, > sub = sub, ylab = "CPU %", ylim = c(0, 100), xlab = "Time", > names.arg = index(df_pct)) > > I am told: > > Warning messages: > 1: In plot.window(xlim, ylim, log = log, ...) : > "names" is not a graphical parameter > 2: In axis(if (horiz) 2 else 1, at = at.l, labels = names.arg, lty = > axis.lty, : > "names" is not a graphical parameter > 3: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) : > "names" is not a graphical parameter > 4: In axis(if (horiz) 1 else 2, cex.axis = cex.axis, ...) : > "names" is not a graphical parameter > > which I find a bit perplexing. > > If I plot via: > > barplot(df_pct, border = NA, col = col_vec, space = 0, main = title, > sub = sub, ylab = "CPU %", ylim = c(0, 100), x lab = "Time", > axisnames = FALSE) > > that works OK, but then I'm unable to generate a human-readable set > of X-axis labels: At best, I'm able to get a single label at the origin. > > Here's an excerpt from the generated df_d: > Browse[2]> df_d[1:10] > kern.cp_time_idle kern.cp_time_intr kern.cp_time_nice > 1297278017 275 0 0 > 1297278018 231 0 0 > 1297278019 266 1 0 > 1297278020 230 2 0 > 1297278021 191 0 0 > 1297278022 114 0 0 > 1297278023 30 0 0 > 1297278024 0 0 0 > 1297278025 0 0 0 > 1297278026 0 0 0 > kern.cp_time_sys kern.cp_time_user > 1297278017 8 1 > 1297278018 17 35 > 1297278019 10 6 > 1297278020 28 24 > 1297278021 21 72 > 1297278022 27 147 > 1297278023 43 219 > 1297278024 47 249 > 1297278025 41 259 > 1297278026 8 15 > > > Here's a excerpt from the generated df_pct: > > Browse[2]> df_pct[1:10] > kern.cp_time_sys kern.cp_time_intr kern.cp_time_user > 1297278017 2.816901 0.000 0.3521127 > 1297278018 6.007067 0.000 12.3674912 > 1297278019 3.533569 0.3533569 2.1201413 > 1297278020 9.859155 0.7042254 8.4507042 > 1297278021 7.394366 0.000 25.3521127 > 1297278022 9.375000 0.000 51.0416667 > 1297278023 14.726027 0.000 75.000 > 1297278024 15.878378 0.000 84.1216216 > 1297278025 13.67 0.000 86.333 > 1297278026
Re: [R] lmm WITHOUT random factor (lme4)
Dear Mark, You cannot compare lm() with lme() because the likelihoods are not the same. Use gls() instead of lm() library(nlme) data("sleepstudy", package = "lme4") fm <- lm(Reaction ~ Days, sleepstudy) fm0 <- gls(Reaction ~ Days, sleepstudy) logLik(fm) logLik(fm0) fm1 <- lme(Reaction ~ Days, random= ~1|Subject, sleepstudy) anova(fm0, fm1) Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 thierry.onkel...@inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey > -Oorspronkelijk bericht- > Van: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] Namens Mark Difford > Verzonden: donderdag 17 maart 2011 10:55 > Aan: r-help@r-project.org > Onderwerp: Re: [R] lmm WITHOUT random factor (lme4) > > On Mar 17, 2011; 11:43am Baugh wrote: > > >> Question: can I simply substitute a dummy var (e.g. populated by > >> zeros) for "ID" to run the model without the random factor? when I > >> try this R returns values that seem reasonable, but I want > to be sure > >> this is appropriate. > > If you can fit the model using lme (and it looks like you > easily can) then another check would be: > > ## Compare models with and without random effects fm <- > lm(Reaction ~ Days, sleepstudy) > fm1 <- lme(Reaction ~ Days, random= ~1|Subject, sleepstudy) > anova(fm1, fm) ## lme-fitted model must come first > > Regards, Mark. > > -- > View this message in context: > http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-t > p3384054p3384072.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
On 2011-03-17 02:08, derek wrote: Exuse me, I don't claim R^2 can't be negative. What I say if I get R^2 negative then the data are useless. I know, that what Thomas said is true in general case. But in my special case of data, using nonzero intercept is nonsense, and to get R^2 less than 0.985 is considered poor job (standard R^2>0.995). (R^2 given by R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2) ) Because lm() uses two differrent formulas for computing R^2, it is confusing to get R^2 closer to 1 when linear model with zero intercept y=a*x (a = slope) is used, rather than in case with model y=a*x+b (a=slope, b= nonzero intercept). I think R^2 is only measure of good fit for least squares optimization and it doesn't matter which formula is used: (R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2) or R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)), but using both is confusing. So I would like to know why two different formulas for R^2 are used? The beauty of R is that you can study the code and if you don't like it, you can just write your own. I doubt that anyone here would be offended if you used whatever definition of R^2 you like best. Thomas has explained clearly what R does and why. If you find R's usage confusing then your understanding of regression is lacking the depth required to apply it. You would probably benefit considerably from consulting a text or two on regression. Peter Ehlers -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3383992.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Segmentation fault when using "plot" function
On Thu, Mar 17, 2011 at 6:58 AM, cchace wrote: > > I frequently get a segmentation fault error when using the "plot" command. > It happens about half the time. > > We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux. > > I did a quick search for this problem and didn't find anything. Apologies if > I missed it. Eventually some kind person with lots of extra time on his/her hand will install R v2.8.0 on their machine, try to reproduce the problem, troubleshoot it, fix the bug, and provide you with a customized retroactive patched R v2.8.0 that you could install. While waiting for that to happen, you may want to try out R v2.12.2. /Henrik > > > *** Process received signal *** > Signal: Segmentation fault (11) > Signal code: Address not mapped (1) > Failing at address: 0x231e18f22 > [ 0] /lib64/libc.so.6 [0x3c84a300a0] > [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090] > [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101] > [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6] > [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0] > [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674] > [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768] > [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59] > [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581] > [ 9] /usr/lib64/R/modules//R_X11.so(X11DeviceDriver+0x19a) [0x2aaab8a336aa] > [10] /usr/lib64/R/modules//R_X11.so [0x2aaab8a33c30] > [11] /usr/lib64/R/lib/libR.so [0x3659cff75e] > [12] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] > [13] /usr/lib64/R/lib/libR.so [0x3659ccd6a2] > [14] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] > [15] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41] > [16] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab] > [17] /usr/lib64/R/lib/libR.so(GEcurrentDevice+0x158) [0x3659c95a88] > [18] /usr/lib64/R/lib/libR.so [0x3659d25c99] > [19] /usr/lib64/R/lib/libR.so [0x3659cff75e] > [20] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] > [21] /usr/lib64/R/lib/libR.so [0x3659ccd6a2] > [22] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] > [23] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41] > [24] /usr/lib64/R/lib/libR.so(Rf_eval+0x2fb) [0x3659ccc1ab] > [25] /usr/lib64/R/lib/libR.so [0x3659ccd6a2] > [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x427) [0x3659ccc2d7] > [27] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x291) [0x3659ccfa41] > [28] /usr/lib64/R/lib/libR.so(Rf_usemethod+0x3fa) [0x3659d019da] > [29] /usr/lib64/R/lib/libR.so [0x3659d0203d] > *** End of error message *** > > Process R segmentation fault at Thu Mar 17 09:33:59 2011 > > -- > View this message in context: > http://r.789695.n4.nabble.com/Segmentation-fault-when-using-plot-function-tp3384603p3384603.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] possible problem with "endpoints"?
Hi Erin, On Thu, Mar 17, 2011 at 1:27 AM, Erin Hodgess wrote: > Dear R People: > > Hello again! > > I found something unusual in the behavior of the "endpoints" function > from the xts package: > >> x1 <- ts(1:24,start=2008,freq=12) >> dat <- seq(as.Date("2008/01/01"),length=24,by="months") >> library(zoo);library(xts) >> x2 <- zoo(1:24,order=dat) >> #Here is the surprise: >> endpoints(as.xts(x1),'quarters') > [1] 0 1 4 7 10 13 16 19 22 24 >> endpoints(as.xts(x2),'quarters') > [1] 0 3 6 9 12 15 18 21 24 > > > Shouldn't the endpoints be the same, please? What am I doing wrong, please? > Looks like a timezone issue. I'll investigate further. > endpoints(as.xts(x1,tz="GMT"),'quarters') [1] 0 3 6 9 12 15 18 21 24 > Thanks, > Erin > > > -- > Erin Hodgess > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: erinm.hodg...@gmail.com > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Ulrich | FOSS Trading: www.fosstrading.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
see inline. On Thu, Mar 17, 2011 at 4:58 AM, Rubén Roa wrote: > Hi Alexx, > > I don't see any problem in comparing models based on different distributions > for the same data using the AIC, as long as they have a different number of > parameters and all the constants are included. > For example, you can compare distribution mixture models with different > number of components using the AIC. > This is one example: > Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length > Frequency Data. Journal of Biological, Agricultural and Environmental > Statistics 15:416-429. > Here is another example: > www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf > Prof. Dayton writes above that one advantage of AIC over hypothesis testing > is: > "(d) Considerations related to underlying distributions for random > variables can be > incorporated into the decision-making process rather than being treated > as an assumption whose > robustness must be considered (e.g., models based on normal densities > and on log-normal > densities can be compared)." My reading of this is that AIC can be used to compare models with densities relative to the same dominating measure. Kjetil > Last, if you read Akaike's theorem you will see there is nothing precluding > comparing models built on different distributional models. Here it is: > " the expected (over the sample space and the space of parameter estimates) > maximum log-likelihood of some data on a working model overshoots the > expected (over the sample space only) maximum log-likelihood of the data > under the true model that > generated the data by exactly the number of parameters in the working model." > A remarkable result. > > Rubén > > -Original Message- > From: r-help-boun...@r-project.org on behalf of Alexx Hardt > Sent: Wed 3/16/2011 7:42 PM > To: r-help@r-project.org > Subject: Re: [R] R² for non-linear model > > Am 16.03.2011 19:34, schrieb Anna Gretschel: >> Am 16.03.2011 19:21, schrieb Alexx Hardt: >>> And to be on-topic: Anna, as far as I know anova's are only useful to >>> compare a submodel (e.g. with one less regressor) to another model. >>> >> thanks! i don't get it either what they mean by fortune... > > It's an R-package (and a pdf [1]) with collected quotes from the mailing > list. > Be careful with the suggestion to use AIC. If you wanted to compare two > models using AICs, you need the same distribution (that is, > Verteilungsannahme) in both models. > To my knowledge, there is no way to "compare" a gaussian model to an > exponential one (except common sense), but my knowledge is very limited. > > [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf > > -- > alexx@alexx-fett:~$ vi .emacs > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incorrect degrees of freedom in SEM model using lavaan
Your model is saturated. I think lavaan calculates the number of degrees of freedom this way: DF = n*(n + 1)/2 - t - n.fix*(n.fix + 1)/2 n = number of variables t = number of free parameters n.fix = number of fixed exogenous variables So, if you fix the exogenous variables, as in mimic = "Mplus": DF = 28 - 13 - 15 = 0 If you don't, as in mimic = "EQS": DF = 28 - 28 - 0 = 0 You are probably not considering the variances and covariances of the exogenous variables to arrive at 15 degrees of freedom: model = ' Internet + Stress3 ~ HHincome + Race + Age + Gender + Stress1 Stress3 ~ Internet HHincome ~~ 0*Race + 0*Age + 0*Gender + 0*Stress1 Race ~~ 0*Age + 0*Gender + 0*Stress1 Age ~~ 0*Gender + 0*Stress1 Gender ~~ 0*Stress1 HHincome ~~ 1*HHincome Race ~~ 1*Race Age ~~ 1*Age Gender ~~ 1*Gender Stress1 ~~ 1*Stress1 ' fit = lavaan(model, sample.nobs=161, sample.cov=myCov, int.ov.free = T, int.lv.free = F, auto.fix.first = T, auto.fix.single = T, auto.resid.var = T, auto.cov.lv.x = T, auto.cov.y = T, mimic = "EQS") summary(fit) Cheers, Gustavo. On Thu, Mar 17, 2011 at 9:50 AM, Andrew Miles wrote: > I have been trying to use lavaan (version 0.4-7) for a simple path model, > but the program seems to be computing far less degrees of freedom for my > model then it should have. I have 7 variables, which should give (7)(8)/2 = > 28 covariances, and hence 28 DF. The model seems to only think I have 13 > DF. The code to reproduce the problem is below. Have I done something > wrong, or is this something I should take to the developer? > > > library(lavaan) > > myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, - > .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02, > .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01 > , .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3", > "HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3", > "HHincome", "Race", "Age", "Gender", "Stress1"))) > > > model = ' > > Internet ~ HHincome + Race + Age + Gender + Stress1 > > Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1 > > ' > > > fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE) > > > #check the number of parameters being estimated > > inspect(fit, what="free") > > > #Note the DF for the Chi-square is 0, when it should be 28-13 = 15 > > summary(fit) > > > Andrew Miles > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Incorrect degrees of freedom in SEM model using lavaan
Dear Andrew, The reported df in lavaan is 0 which is correct. It is because this path model is saturated. "28" is not the df, it is the no. of pieces of information. The no. of parameter estimates is also 28. Thus, the df is 0. However, you are correct that there are only 13, not 28, free parameters reported by lavaan. It is because "fixed.x=TRUE" was implicitly used in your example (see ?sem). Since HHincome + Race + Age + Gender + Stress1 are predictors, they are fixed at their sample values when "fixed.x=TRUE" was used. And their covariance elements are not involved in calculating the df. If you want to mirror the behaviors in other SEM packages, you may try: fit2 <- sem(model, sample.nobs=161, sample.cov=myCov, fixed.x=FALSE) inspect(fit2, what="free") summary(fit2) Hope it helps. Mike -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - On Thu, Mar 17, 2011 at 8:50 PM, Andrew Miles wrote: > I have been trying to use lavaan (version 0.4-7) for a simple path model, > but the program seems to be computing far less degrees of freedom for my > model then it should have. I have 7 variables, which should give (7)(8)/2 = > 28 covariances, and hence 28 DF. The model seems to only think I have 13 > DF. The code to reproduce the problem is below. Have I done something > wrong, or is this something I should take to the developer? > > > library(lavaan) > > myCov = matrix(c(24.40, 0, 0, 0, 0, 0, 0, .03, .03, 0, 0, 0, 0, 0, 6.75, - > .08, 519.38, 0, 0, 0, 0, .36, .01, 2.74, .18, 0, 0, 0, .51, .0, -.31, .02, > .2, .0, 0, -.17, .0, -1.6, -.04, .01, .25, 0, -.1, .02, -.03, .0, -.01, .01 > , .02), nrow=7, byrow=TRUE, dimnames=list(c("Internet", "Stress3", > "HHincome", "Race", "Age", "Gender", "Stress1"), c("Internet", "Stress3", > "HHincome", "Race", "Age", "Gender", "Stress1"))) > > > model = ' > > Internet ~ HHincome + Race + Age + Gender + Stress1 > > Stress3 ~ Internet + HHincome + Race + Age + Gender + Stress1 > > ' > > > fit=sem(model, sample.nobs=161, sample.cov=myCov, mimic.Mplus=FALSE) > > > #check the number of parameters being estimated > > inspect(fit, what="free") > > > #Note the DF for the Chi-square is 0, when it should be 28-13 = 15 > > summary(fit) > > > Andrew Miles > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] setting up a R working group
1. There is a CRAN mirror at Bristol (roughly 100km from Swansea?). They must have (or have had) some very active R users there. 2. Sundar Dorai-Raj and I developed a local R Archive Network and subversion repository internal to a company where we used to work. I believe that made it easier for us to collaborate and to share our results with others. (See "Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories", available from CRAN -> Documentation: Contributed.) This is not the first thing you should do, but I believe it can help build local collaboration and awareness of R. Hope this helps. Spencer On 3/17/2011 6:42 AM, Joanne Demmler wrote: Dear R users, we are currently trying to set up a R working group at Swansea University. I would be very grateful to get some information and feedback from R users that have done something similar, in particular regarding: - help with giving a general overview of what R is capable of - help with what to present or teach (did you organise your own training programme or did you get someone from outside?) - are there people out there who would like to present their work to us? Any help will be highly appreciated! Thanks Joanne __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need to abstract changing name of column within loop
On Wed, Mar 16, 2011 at 6:58 PM, jctoll wrote: > Hi, > > I'm struggling to figure out the way to change the name of a column > from within a loop. The problem is I can't refer to the object by its > actual variable name, since that will change each time through the > loop. My xts object is A. > >> head(A) > A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1 > 2007-01-03 34.99 35.48 34.05 34.30 2574600 34.30 1186780 > 2007-01-04 34.30 34.60 33.46 34.41 2073700 34.41 1190586 > 2007-01-05 34.30 34.40 34.00 34.09 2676600 34.09 1179514 > 2007-01-08 33.98 34.08 33.68 33.97 1557200 33.97 1175362 > 2007-01-09 34.08 34.32 33.63 34.01 1386200 34.01 1176746 > 2007-01-10 34.04 34.04 33.37 33.70 2157400 33.70 1166020 > > It's column names are: >> colnames(A) > [1] "A.Open" "A.High" "A.Low" "A.Close" > "A.Volume" "A.Adjusted" "A.Adjusted.1" > > I want to change the 7th column name: >> colnames(A)[7] > [1] "A.Adjusted.1" > > I need to do that through a reference to i: >> i > [1] "A" > > This works: >> colnames(get(i))[7] > [1] "A.Adjusted.1" > > And this is what I want to change the column name to: >> paste(i, ".MarketCap", sep = "") > [1] "A.MarketCap" > > But how do I make the assignment? This clearly doesn't work: > >> colnames(get(i))[7] <- paste(i, ".MarketCap", sep = "") > Error in colnames(get(i))[7] <- paste(i, ".MarketCap", sep = "") : > could not find function "get<-" > > Nor does this (it creates a new object "A.Adjusted.1" with a value of > "A.MarketCap") : > > assign(colnames(get(i))[7], paste(i, ".MarketCap", sep = "")) > > How can I change the name of that column within my big loop? Any > ideas? Thanks! > I usually make a copy of the object, change it, then overwrite the original: tmp <- get(i) colnames(tmp)[7] <- "foo" assign(i,tmp) Hope that helps. > Best regards, > > > James > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Ulrich | FOSS Trading: www.fosstrading.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
On Wed, Mar 16, 2011 at 3:49 PM, derek wrote: > k=lm(y~x) > summary(k) > returns R^2=0.9994 > > lm(y~x) is supposed to find coef. a anb b in y=a*x+b > > l=lm(y~x+0) > summary(l) > returns R^2=0.9998 > lm(y~x+0) is supposed to find coef. a in y=a*x+b while setting b=0 > > The question is why do I get better R^2, when it should be otherwise? > > Im sorry to use the word "MS exel" here, but I verified it in exel and it > gives: > R^2=0.9994 when y=a*x+b is used > R^2=0.99938 when y=a*x+0 is used > The idea is that if you have a positive quantity that can be broken down into two nonnegative quantities: X = X1 + X2 then it makes sense to ask what proportion X1 is of X. For example: 10 = 6 + 4 and 6 is .6 of the total. Now, in the case of a model with an intercept its a mathematical fact that the variance of the response equals the variance of the fitted model plus the variance of the residuals. Thus it makes sense to ask what fraction of the variance of the response is represented by the variance of the fitted model (this fraction is R^2). But if there is no intercept then that mathematical fact breaks down. That is, its no longer true that the variance of the response equals the variance of the fitted model plus the variance of the residuals. Thus how meaningful is it to ask what proportion the variance of the fitted model is of the variance of the response in the first place? In this case we need to rethink the entire approach which is why a different formula is required. Also, maybe the real problem is not this at all. That is perhaps you are not really trying to find the goodness of fit but rather you are trying to compare two particular models: one with intercept and one without. In that case R^2 is not really what you want. Instead use the R anova command. For example, using the built in BOD data frame: > fm <- lm(demand ~ Time, BOD) > fm0 <- lm(demand ~ Time - 1, BOD) > anova(fm, fm0) Analysis of Variance Table Model 1: demand ~ Time Model 2: demand ~ Time - 1 Res.Df RSS Df Sum of Sq F Pr(>F) 1 4 38.069 2 5 135.820 -1 -97.751 10.271 0.03275 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Here we see that the residual sum of squares is much less for the full model than for the reduced model and its significant at the 3.275% level. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] calculating AUCs for each of the 1000 boot strap samples
Taby, First, it is better to reply to the whole list (which I have included on this reply); there is a better chance of someone helping you. Just because I could help with one aspect does not mean I necessarily can (or have the time to) help with more. Further comments are inline below. On 3/16/2011 10:45 PM, taby gathoni wrote: Hi Brian, Thanks for this comment I will action on this. Thanks also for the comment, Andrija also advised the same thing and it worked like magic. My next cause of action was to get the confidence intervals with the AUC values. For the confidence intervals i did them manually. for 99% i cut out first 5 and last 5 after ranking the ACs while for 95% CI i cut out first 25 and last 25 In general, this is right. The middle 95% excludes the 2.5% on the ends, so for 1000 samples that is excluding the 25 most extreme values. and this is my output Upper bound Lower Bound at 99% CI 0.8175 0.50125 at 95% CI 0.7775 0.50375 from my understanding because there are small samples of 20 GOOD and 20 BAD the variations in the upper and lower bound should be minimal in the 1000 samples. I don't know why you would necessarily expect the variance to be minimal. It is what it is. Also, I don't know why you took 20 of each rather than just a random sub-sample. If you get time, Would you be in a position to assist me find out why i have such huge variations? Thank you for taking time to respond. Maybe pull out 10 of your bootstrap samples and look at the ROC curves themselves and their associated AUC. That might give you a sense as to the variability that is possible (which is reflected in the confidence interval). As a final note, you are reinventing the wheel. There are several packages that deal with ROC curves. Two I like in particular are ROCR and pROC. The latter even has built in routines for computing confidence intervals for the AUC using bootstrap replication. Kind regards, Taby --- On Wed, 3/16/11, Brian Diggs wrote: From: Brian Diggs Subject: Re: calculating AUCs for each of the 1000 boot strap samples To: tab...@yahoo.com Cc: "R help" Date: Wednesday, March 16, 2011, 10:42 PM On 3/16/2011 8:04 AM, taby gathoni wrote: data<-data.frame(id=1:(165+42),main_samp$SCORE, x=rep(c("BAD","GOOD"),c(42,165))) f<-function(x) { + str.sample<-list() + for (i in 1:length(levels(x$x))) + { + str.sample[[i]]<-x[x$x==levels(x$x)[i] ,][sample(tapply(x$x,x$x,length)[i],20,rep=T),] + } + strat.sample<-do.call("rbind",str.sample) + return(strat.sample$main_samp.SCORE) + } f(data) [1] 706 633 443 843 756 743 730 843 706 730 606 743 768 768 743 763 608 730 743 743 530 813 813 831 793 900 793 693 900 738 706 831 [33] 818 758 718 831 768 638 770 738 repl<-list() auc<-list() for(i in 1:1000) + { + repl[[i]]<-f(data) + auc[[i]]<-colAUC(repl[[i]],rep(c("BAD","GOOD")),plotROC=FALSE,alg="ROC") + } Error in colAUC(repl[[i]], rep(c("BAD", "GOOD")), plotROC = FALSE, alg = "ROC") : colAUC: length(y) and nrow(X) must be the same Thanks alotTaby I think (though I can't check because the example is not reproducible without main_samp$SCORE), that the problem is that the second argument to colAUC should be rep(c("BAD", "GOOD"), c(20,20)) The error is that repl[[i]] is length 40 while rep(c("BAD", "GOOD")) is length 2. P.S. When giving an example, it is better to not include the prompts and continuation prompts. Copy it from the script rather than the output. Relevant output can then be included as script comments (prefixed with #). That makes cutting-and-pasting to test easier. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health& Science University -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex query (Apache logs)
Hello Allan, Thanks the response. Provides me hope. I appreciate [3], might even go with that. And for posterity, here's the code (assuming pastebin never expires) [1] Test string : http://pastebin.com/FyAFzmTv [2] Pattern (modified as per your suggestion) : http://pastebin.com/s7VT0r5K pattern <- readLines(url("http://pastebin.com/raw.php?i=s7VT0r5K";), warn=FALSE) test <- readLines(url("http://pastebin.com/raw.php?i=rbAvR2dK";),warn=FALSE) regexpr(pattern, test, perl=TRUE) Thanks Saptarshi On Thu, Mar 17, 2011 at 12:12 AM, Allan Engelhardt wrote: > Some comments: > > 1. [^\s] matches everything up to a literal 's', unless perl=TRUE. > 2. The (.*) is greedy, so you'll need (.*?)"\s"(.*?)"\s"(.*?)"$ or similar > at the end of the expression > > With those changes (and removing a space inserted by the newsgroup posting) > the expression works for me. > > > (pat <- readLines("/tmp/b.txt")[1]) > [1] > "^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\s([^\\s]*)\\s([^\\s]*)\\s\\[([^\\]]+)\\]\\s\"([A-Z]*)\\s([^\\s]*)\\s([^\\s]*)\"\\s([^\\s]+)\\s(\\d+)\\s\"(.*?)\"\\s\"(.*?)\"\\s\"(.*?)\"$" > > regexpr(pat, test, perl=TRUE) > [1] 1 > attr(,"match.length") > [1] 436 > > 3. Consider a different approach, e.g. scan(textConnection(test), > what=character(0)) > > Hope this helps > > Allan > > > > On 16/03/11 22:18, Saptarshi Guha wrote: > >> Hello R users, >> >> I have this regex see [1] for apache log lines. I tried using R to parse >> some data (only because I wanted to stay in R). >> A sample line is [2] >> >> (a) I saved the line in [1] into "~/tmp/a.txt" and [2] into "/tmp/a.txt" >> >> pat<- readLines("~/tmp/a.txt") >> test<- readLines("/tmp/a.txt") >> test >> grep(pat,test) >> >> returns integer(0) >> >> The same query works in python via re.match() (i.e does return groups) >> >> Using readLines, the regex is escaped for me. Does Python and R use >> different regex styles? >> >> Cheers >> Saptarshi >> >> [1] >> >> >> ^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s([^\s]*)\s([^\s]*)\s\[([^\]]+)\]\s"([A-Z]*)\s([^\s]*)\s([^\s]*)"\s([^\s]+)\s(\d+)\s"(.*)"\s"(.*)"\s"(.*)"$ >> >> [2] >> 220.213.119.925 addons.mozilla.org - [10/Jan/2001:01:55:07 -0800] "GET >> >> /blocklist/3/%8ce33983c0-fd0e-11dc-12aa-0800200c9a66%7D/4.0b5/Fennec/20110217140304/Android_arm-eabi-gcc3/chrome:%2F%2Fglobal%2Flocale%2Fintl.properties/beta/Linux% >> 202.6.32.9/default/default/6/6/1/ HTTP/1.1" 200 3243 "-" "Mozilla/5.0 >> (Android; Linux armv7l; rv:2.0b12pre) Gecko/20110217 Firefox/4.0b12pre >> Fennec/4.0b5" "BLOCKLIST_v3=110.163.217.169.1299218425.9706" >> >>[[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Spatial cluster analysis of continous outcome variable
Dear R Users, R Core Team, I have a two dimensional space where I measure a numerical value in two situations at different points. I have measured the change and I would like to test if there are areas in this 2D-space where there is a different amount of change (no change, increase, decrease). I don´t know if it´s better to analyse the data just with the coordinates or if its better to group them in "pixels" (and obtain the mean value for each pixel) and then run the cluster analysis. I would like to know if there is a package/function that allows me to do these calculations.I would also like to know if it could be done in a 3D space (I have collapsed the data to 2D because I don´t have many points. Thanks in advance J Toledo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Segmentation fault when using "plot" function
On Thu, 17 Mar 2011, Henrik Bengtsson wrote: On Thu, Mar 17, 2011 at 6:58 AM, cchace wrote: I frequently get a segmentation fault error when using the "plot" command. It happens about half the time. We are running an old version of R (R version 2.8.0 (2008-10-20) on Linux. I did a quick search for this problem and didn't find anything. Apologies if I missed it. Eventually some kind person with lots of extra time on his/her hand will install R v2.8.0 on their machine, try to reproduce the problem, troubleshoot it, fix the bug, and provide you with a customized retroactive patched R v2.8.0 that you could install. While waiting for that to happen, you may want to try out R v2.12.2. If the bug is in R: it looks much more likely to be in the locale handling of the X11 functions of the unstated (but probably old and unpatched) version of 'Linux'. *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0x231e18f22 [ 0] /lib64/libc.so.6 [0x3c84a300a0] [ 1] /lib64/libc.so.6(strlen+0x10) [0x3c84a76090] [ 2] /usr/lib64/libX11.so.6 [0x3c8765d101] [ 3] /usr/lib64/libX11.so.6(_XlcCreateLC+0x46) [0x3c8765e5d6] [ 4] /usr/lib64/libX11.so.6(_XlcUtf8Loader+0x10) [0x3c8767ebe0] [ 5] /usr/lib64/libX11.so.6(_XOpenLC+0x104) [0x3c87665674] [ 6] /usr/lib64/libX11.so.6(_XlcCurrentLC+0x8) [0x3c87665768] [ 7] /usr/lib64/libX11.so.6(XSupportsLocale+0x9) [0x3c87665b59] [ 8] /usr/lib64/R/modules//R_X11.so(X11_Open+0x51) [0x2aaab8a32581] That is all that was useful -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cv.lm syntax error
I rewrite my previous comand, CVlm works now. It was related to having the same names in the df than in the formula included at CVlm . Now I'd like to know: Q1: if the seed is of any special importance, or I can type just seed=29 like in the example? Q2: I typed: CVlm(df = mydf, form.lm = formula(response~v1+v2+factor), m=3, dots = TRUE, seed=29, plotit=TRUE, printit=TRUE) In case is important factor has three levels, this way: table(factor): levels: 2 3 4 freq: 2 21 34 And when the 3rd Fold it says: fold 3 Observations in test set: 2 5 6 8 11 13 15 16 17 22 29 33 35 37 41 46 51 52 55 Error en model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : factor 'factor' has new level(s) 2 What does it mean? Thanks in advance,u...@host.com -- View this message in context: http://r.789695.n4.nabble.com/cv-lm-syntax-error-tp3334889p3385065.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
Thank you for your very comprehensible answer. I a priori know that model y=a*x+0 is right and that I can't get x=constant nor y=constant. I'm comparing performance of data gathering in my data set to another data sets in which performance gathering is characterized by R-squared . The data in data sets are not random but have random error with normal distribution. The error in my data is composed of human and equipment error (normal distribution), appliance error (probably skewed normal distribution) and some other interferences. So I compare model y = a*x + b and y = a*x + 0 to know the magnitude of interferences which I can't influence - the difference of R^2's. -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3385009.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
Yes they are. I had edited the reply, but It didn't help. Correction: 2)I meant zero slope, no zero intercept. -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3384648.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmm WITHOUT random factor (lme4)
On Mar 17, 2011; 04:29pm Thierry Onkelinx wrote: >> You cannot compare lm() with lme() because the likelihoods are not the >> same. Use gls() instead of lm() Hi Thierry, Of course, I stand subject to correction, but unless something dramatic has changed, you can. gls() can be used if you need to accommodate a correlation structure. The method I have outlined, i.e. anova(lme$obj, lm$obj), is detailed in Pinheiro & Bates (2000) beginning page 154. Please refer to this if you doubt me. Regards, Mark. -- View this message in context: http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384802.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
Thats exactly what I would like to do. Any idea on good text? I've consulted severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) / Sum((y[i])^2-y*)) still less why to use different formulas for similar model or why should be R^2 closer to 1 when y=a*x+0 than in general model y=a*x+b. from manual: r.squared R^2, the ‘fraction of variance explained by the model’, R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), where y* is the mean of y[i] "if there is an intercept" and zero otherwise. I don't need explaining what R^2 does nor how to interpret it, because I know what it means and how it is derived. I don't need to be told which model I should apply. So the answers from Thomas weren't helpful. I don't claim it is wrong, otherwise wouldn't be employed, but I want to see the reason behind using two formulas. Control questions: 1) Statement "if there is an intercept" means intercept including zero intercept? 2) If I use model y = a*x+0 which formula for R^2 is used: the one with Y* or the one without? -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3384844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Beginner question: How to replace part of a filename in read.csv?
I would like to use samp as a part of a filename that I can change. My source files are .csv files with date as the file name, and I would like to be able to type in the date (later perhaps automate this using list.files) and then read the csv and write the pdf automatically. I have tried different combinations with "" and () around samp, but I keep getting the error "object 'samp.csv' not found". samp <- "20110317" read.csv(file=samp.csv,...) #next R processes some code that works fine, and then should save the figure: pdf(file=samp.pdf,...) dev.off() How can I get R to replace samp as 20110317 in front of .csv and .pdf? Best, Sjouke -- View this message in context: http://r.789695.n4.nabble.com/Beginner-question-How-to-replace-part-of-a-filename-in-read-csv-tp3384786p3384786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmm WITHOUT random factor (lme4)
On Mar 17, 2011; 04:29pm Thierry Onkelinx wrote: >> You cannot compare lm() with lme() because the likelihoods are not the >> same. Use gls() instead of lm() And perhaps I should have added the following: First para on page 155 of Pinheiro & Bates (2000) states, "The anova method can be used to compare lme and lm objects." Regards, Mark. -- View this message in context: http://r.789695.n4.nabble.com/lmm-WITHOUT-random-factor-lme4-tp3384054p3384823.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Gelman-Rubin convergence diagnostics via coda package
Dear, I'm trying to run diagnostics on MCMC analysis (fitting a log-linear model to rates data). I'm getting an error message when trying Gelman-Rubin shrink factor plot: >gelman.plot(out) Error in chol.default(W) : the leading minor of order 2 is not positive definite I take it that somewhere, somehow a matrix is singular, but how can that be remedied? My code: library(rjags) #JAGS MODEL3 CODE-# data { for (i in 1:N) { y[i] <- z2[i] - z1[i] } } model { for (i in 1 : N) { mean[i] <- (beta[i] * (n[i] - z1[i]) * (n[i] - z2[i]) ) / n[i] y[i] ~ dpois( mean[i] ) log(beta[i]) <- a1 * type[i] + a0 } a1 ~ dnorm(0, 1.0E-4) a0 ~ dnorm(0, 1.0E-4) } #--# N = 12 n = c(63, 33, 18, 16, 36, 3, 33, 101, 64, 30, 10, 36) z1 = c(33, 8, 14, 4, 2, 8, 24, 76, 35, 14, 6, 22) z2 = c(37, 8, 16, 6, 3, 10, 28, 91, 56, 17, 8, 28) type = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1) init0 = list(a0 = 0.0, a1 = 0.0, .RNG.seed = 1234, .RNG.name = "base::Super-Duper") init1 = list(a0 = 1.0, a1 = 1.0, .RNG.seed = 1234, .RNG.name = "base::Super-Duper") jags <- jags.model('model3.bug', data = list('N' = N, 'n' = n, 'z1' = z1, 'z2' = z2, 'type' = type), inits = list(init0, init1), n.chains = 2, n.adapt = 100) update(jags, 1000) out <- coda.samples(model = jags, variable.names = c('beta'), n.iter = 25000, thin = 5) gelman.plot(out) -- while(!succeed) { try(); } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...
Hi, I have the following type of data: 86 subjects in three independent groups (high power vs low power vs control). Each subject solves 8 reasoning problems of two kinds: conflict problems and noconflict problems. I measure accuracy in solving the reasoning problems. To summarize: binary response, 1 within subject var (TYPE), 1 between subject var (POWER). I wanted to fit the following model: for problem i, person j: logodds ( Y_ij ) = b_0j + b_1j TYPE_ij with b_0j = b_00 + b_01 POWER_j + u_0j and b_1j = b_10 + b_11 POWER_j I think it makes sense, but I'm not sure. Here are the observed cell means: conflict noconflict control 0.68965520.9568966 high 0.69354840.9677419 low 0.88461540.9903846 GLMER gives me: summary(glmer(accuracy~f_power*f_type + (1|subject), family=binomial,data=syllogisms)) Generalized linear mixed model fit by the Laplace approximation Formula: accuracy ~ f_power * f_type + (1 | subject) Data: syllogisms AIC BIC logLik deviance 406 437.7 -196 392 Random effects: Groups NameVariance Std.Dev. subject (Intercept) 4.9968 2.2353 Number of obs: 688, groups: subject, 86 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.507450.50507 2.985 0.00284 ** f_powerhp0.130830.70719 0.185 0.85323 f_powerlow 2.041210.85308 2.393 0.01672 * f_typenoconflict 3.287150.64673 5.083 3.72e-07 *** f_powerhp:f_typenoconflict 0.216800.93165 0.233 0.81599 f_powerlow:f_typenoconflict -0.011991.45807 -0.008 0.99344 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Correlation of Fixed Effects: (Intr) f_pwrh f_pwrl f_typn f_pwrh:_ f_powerhp -0.714 f_powerlow -0.592 0.423 f_typncnflc -0.185 0.132 0.109 f_pwrhp:f_t 0.128 -0.170 -0.076 -0.694 f_pwrlw:f_t 0.082 -0.059 -0.144 -0.444 0.308 glmmPQL gives me: summary(glmmPQL(fixed=accuracy~f_power*f_type, random=~1|subject, family=binomial, data=syllogisms)) iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 iteration 6 Linear mixed-effects model fit by maximum likelihood Data: syllogisms AIC BIC logLik NA NA NA Random effects: Formula: ~1 | subject (Intercept) Residual StdDev:1.817202 0.8045027 Variance function: Structure: fixed weights Formula: ~invwt Fixed effects: accuracy ~ f_power * f_type Value Std.Error DF t-value p-value (Intercept) 1.1403334 0.4064642 599 2.805496 0.0052 f_powerhp 0.0996481 0.5683296 83 0.175335 0.8612 f_powerlow 1.5358270 0.6486150 83 2.367856 0.0202 f_typenoconflict3.0096016 0.4769761 599 6.309754 0. f_powerhp:f_typenoconflict 0.1856061 0.6790046 599 0.273350 0.7847 f_powerlow:f_typenoconflict 0.0968204 1.0318659 599 0.093830 0.9253 Correlation: (Intr) f_pwrh f_pwrl f_typn f_pwrh:_ f_powerhp -0.715 f_powerlow -0.627 0.448 f_typenoconflict-0.194 0.138 0.121 f_powerhp:f_typenoconflict 0.136 -0.182 -0.085 -0.702 f_powerlow:f_typenoconflict 0.089 -0.064 -0.153 -0.462 0.325 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -12.43735991 0.06243699 0.22966010 0.33106978 2.23942234 Number of Observations: 688 Number of Groups: 86 Strange thing is that when you convert the estimates to probabilities, they are quite far off. For control, no conflict (intercept), the estimation from glmer is 1.5 -> 81% and for glmmPQL is 1.14 -> 75%, whereas the observed is: 68%. Am I doing something wrong? Any help is very much appreciated. Sam. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner question: How to replace part of a filename in read.csv?
You need to use paste(), as in paste(samp, ".csv", sep="") Use that as your filename. Sarah On Thu, Mar 17, 2011 at 11:05 AM, pierz wrote: > I would like to use samp as a part of a filename that I can change. My source > files are .csv files with date as the file name, and I would like to be able > to type in the date (later perhaps automate this using list.files) and then > read the csv and write the pdf automatically. I have tried different > combinations with "" and () around samp, but I keep getting the error > "object 'samp.csv' not found". > > samp <- "20110317" > read.csv(file=samp.csv,...) > #next R processes some code that works fine, and then should save the > figure: > pdf(file=samp.pdf,...) > dev.off() > > How can I get R to replace samp as 20110317 in front of .csv and .pdf? > > Best, > Sjouke > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generalized mixed linear models, glmmPQL and GLMER give very different results that both do not fit the data well...
I suggest that you post this on the R-sig-mixed-models list where you are more likely to find those with bothe interest and expertise in these matters. -- Bert On Thu, Mar 17, 2011 at 7:44 AM, Franssens, Samuel wrote: > Hi, > > I have the following type of data: 86 subjects in three independent groups > (high power vs low power vs control). Each subject solves 8 reasoning > problems of two kinds: conflict problems and noconflict problems. I measure > accuracy in solving the reasoning problems. To summarize: binary response, 1 > within subject var (TYPE), 1 between subject var (POWER). > > I wanted to fit the following model: for problem i, person j: > logodds ( Y_ij ) = b_0j + b_1j TYPE_ij > with b_0j = b_00 + b_01 POWER_j + u_0j > and b_1j = b_10 + b_11 POWER_j > > I think it makes sense, but I'm not sure. > Here are the observed cell means: > conflict noconflict > control 0.6896552 0.9568966 > high 0.6935484 0.9677419 > low 0.8846154 0.9903846 > > GLMER gives me: > summary(glmer(accuracy~f_power*f_type + (1|subject), > family=binomial,data=syllogisms)) > Generalized linear mixed model fit by the Laplace approximation > Formula: accuracy ~ f_power * f_type + (1 | subject) > Data: syllogisms > AIC BIC logLik deviance > 406 437.7 -196 392 > Random effects: > Groups Name Variance Std.Dev. > subject (Intercept) 4.9968 2.2353 > Number of obs: 688, groups: subject, 86 > > Fixed effects: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 1.50745 0.50507 2.985 0.00284 ** > f_powerhp 0.13083 0.70719 0.185 0.85323 > f_powerlow 2.04121 0.85308 2.393 0.01672 * > f_typenoconflict 3.28715 0.64673 5.083 3.72e-07 *** > f_powerhp:f_typenoconflict 0.21680 0.93165 0.233 0.81599 > f_powerlow:f_typenoconflict -0.01199 1.45807 -0.008 0.99344 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Correlation of Fixed Effects: > (Intr) f_pwrh f_pwrl f_typn f_pwrh:_ > f_powerhp -0.714 > f_powerlow -0.592 0.423 > f_typncnflc -0.185 0.132 0.109 > f_pwrhp:f_t 0.128 -0.170 -0.076 -0.694 > f_pwrlw:f_t 0.082 -0.059 -0.144 -0.444 0.308 > > glmmPQL gives me: > summary(glmmPQL(fixed=accuracy~f_power*f_type, random=~1|subject, > family=binomial, data=syllogisms)) > iteration 1 > iteration 2 > iteration 3 > iteration 4 > iteration 5 > iteration 6 > Linear mixed-effects model fit by maximum likelihood > Data: syllogisms > AIC BIC logLik > NA NA NA > > Random effects: > Formula: ~1 | subject > (Intercept) Residual > StdDev: 1.817202 0.8045027 > > Variance function: > Structure: fixed weights > Formula: ~invwt > Fixed effects: accuracy ~ f_power * f_type > Value Std.Error DF t-value p-value > (Intercept) 1.1403334 0.4064642 599 2.805496 0.0052 > f_powerhp 0.0996481 0.5683296 83 0.175335 0.8612 > f_powerlow 1.5358270 0.6486150 83 2.367856 0.0202 > f_typenoconflict 3.0096016 0.4769761 599 6.309754 0. > f_powerhp:f_typenoconflict 0.1856061 0.6790046 599 0.273350 0.7847 > f_powerlow:f_typenoconflict 0.0968204 1.0318659 599 0.093830 0.9253 > Correlation: > (Intr) f_pwrh f_pwrl f_typn f_pwrh:_ > f_powerhp -0.715 > f_powerlow -0.627 0.448 > f_typenoconflict -0.194 0.138 0.121 > f_powerhp:f_typenoconflict 0.136 -0.182 -0.085 -0.702 > f_powerlow:f_typenoconflict 0.089 -0.064 -0.153 -0.462 0.325 > > Standardized Within-Group Residuals: > Min Q1 Med Q3 Max > -12.43735991 0.06243699 0.22966010 0.33106978 2.23942234 > > Number of Observations: 688 > Number of Groups: 86 > > > Strange thing is that when you convert the estimates to probabilities, they > are quite far off. For control, no conflict (intercept), the estimation from > glmer is 1.5 -> 81% and for glmmPQL is 1.14 -> 75%, whereas the observed is: > 68%. > > Am I doing something wrong? > > Any help is very much appreciated. > Sam. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Beginner question: How to replace part of a filename in read.csv?
> paste(samp, ".pdf", sep="") [1] "20110317.pdf" > paste(samp, ".csv", sep="") [1] "20110317.csv" On Thursday, March 17, 2011 at 10:05 AM, pierz wrote: I would like to use samp as a part of a filename that I can change. My source > files are .csv files with date as the file name, and I would like to be able > to type in the date (later perhaps automate this using list.files) and then > read the csv and write the pdf automatically. I have tried different > combinations with "" and () around samp, but I keep getting the error > "object 'samp.csv' not found". > > samp <- "20110317" > read.csv(file=samp.csv,...) > #next R processes some code that works fine, and then should save the > figure: > pdf(file=samp.pdf,...) > dev.off() > > How can I get R to replace samp as 20110317 in front of .csv and .pdf? > > Best, > Sjouke > > -- > View this message in context: > http://r.789695.n4.nabble.com/Beginner-question-How-to-replace-part-of-a-filename-in-read-csv-tp3384786p3384786.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting multiple figures on one page
Jim, Thanks for looking into this. The c without paste works. If the rq model overrides the mfrow, I think I will have to piece together individual plots using other software. scarlet -- View this message in context: http://r.789695.n4.nabble.com/plotting-multiple-figures-on-one-page-tp3382981p3384787.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
It is all a matter of what you are comparing too, or what the null model is. For most cases (standard regression) we compare a model with slope and intercept to an intercept only model (looking at the effect of the slope), the intercept only model fits a horizontal line through the mean of the y's hence the subtraction of the mean. If we don't do that then R-squared can easily become meaningless. Here is an example where we compute the r-squared using the no-intercept formula: x <- rnorm(100, 1000, 20) y <- rnorm(100, 1000, 20) cor(x,y) summary( lm( y ~ rep(1,100) + x + 0 ) ) Notice how big the r-squared value is (and that it is not anywhere near the square of the correlation) for data that is pretty independent. When you force the intercept to 0, then you are using a different null model (mean 0). Part of Thomas's point was that if we still subtract the mean in this case then the calculation of r-squared can give a negative number, which you pointed out is meaningless, the gist is that that is the incorrect formula to use and so R instead uses the formula without subtracting the mean when you don't fit an intercept. The reason the r-squared values are different is because they are using different denominators and are therefore not comparable. The reason that R uses 2 different formulas/denominators is because there is not one single formula/denominator that makes general sense in both cases. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of derek > Sent: Thursday, March 17, 2011 9:29 AM > To: r-help@r-project.org > Subject: Re: [R] Strange R squared, possible error > > Thats exactly what I would like to do. Any idea on good text? I've > consulted > severel texts, but no one defined R^2 as R^2 = 1 - Sum(R[i]^2) / > Sum((y[i])^2-y*)) still less why to use different formulas for similar > model > or why should be R^2 closer to 1 when y=a*x+0 than in general model > y=a*x+b. > > from manual: > r.squared R^2, the ‘fraction of variance explained by the model’, > R^2 = 1 - Sum(R[i]^2) / Sum((y[i]- y*)^2), > where y* is the mean of y[i] "if there is an intercept" and zero > otherwise. > > I don't need explaining what R^2 does nor how to interpret it, because > I > know what it means and how it is derived. I don't need to be told which > model I should apply. So the answers from Thomas weren't helpful. > > I don't claim it is wrong, otherwise wouldn't be employed, but I want > to see > the reason behind using two formulas. > > Control questions: > 1) Statement "if there is an intercept" means intercept including zero > intercept? > > 2) If I use model y = a*x+0 which formula for R^2 is used: the one with > Y* > or the one without? > > -- > View this message in context: http://r.789695.n4.nabble.com/Strange-R- > squared-possible-error-tp3382818p3384844.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting gamm with interaction term
Hi all, I would like to fit a gamm model of the form: Y~X+X*f(z) Where f is the smooth function and With random effects on X and on the intercept. So, I try to write it like this: gam.lme<- gamm(Y~ s(z, by=X) +X, random=list(groups=pdDiag(~1+X)) ) but I get the error message : Error in MEestimate(lmeSt, grps) : Singularity in backsolve at level 0, block 1 When I simply fit a gam model using the formula above, then it works ok. Is it possible to fit such a model with gamm? Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plotting multiple figures on one page
Scarlet, If the mfrow is being overridden, perhaps the rimage package might be able to piece the individual plots... -- Muhammad Rahiz Researcher & DPhil Candidate (Climate Systems & Policy) School of Geography & the Environment University of Oxford On Thu, 17 Mar 2011, scarlet wrote: Jim, Thanks for looking into this. The c without paste works. If the rq model overrides the mfrow, I think I will have to piece together individual plots using other software. scarlet -- View this message in context: http://r.789695.n4.nabble.com/plotting-multiple-figures-on-one-page-tp3382981p3384787.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] fitting gamm with interaction term
Is X a numeric variable or a factor? If it's numeric try gam.lme<- gamm(Y~ s(z, by=X), random=list(groups=pdDiag(~1+X)) ) ... since otherwise the separate X term is confounded with s(z, by=X). (gam detects such confounding and copes with it, but gamm can't). Simon On 17/03/11 17:47, Irene Mantzouni wrote: Hi all, I would like to fit a gamm model of the form: Y~X+X*f(z) Where f is the smooth function and With random effects on X and on the intercept. So, I try to write it like this: gam.lme<- gamm(Y~ s(z, by=X) +X, random=list(groups=pdDiag(~1+X)) ) but I get the error message : Error in MEestimate(lmeSt, grps) : Singularity in backsolve at level 0, block 1 When I simply fit a gam model using the formula above, then it works ok. Is it possible to fit such a model with gamm? Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
I have a matrix say: 23 1 12 12 00 0 1 0 1 0 2 23 2 I want to count of number of distinct rows and the number of disinct element in the second column and put these counts in a column. SO at the end of the day I should have: c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for the counts of how many times the elements in the second column exists. Any help is greatly appreciated. -- Thanks, Jim. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatial cluster analysis of continous outcome variable
Did you post your data or hypothetical data? Usually that helps make your problem more clear and more interesting ( likely to get a useful response to your post). From: tintin...@hotmail.com To: r-help@r-project.org Date: Thu, 17 Mar 2011 17:38:14 +0100 Subject: [R] Spatial cluster analysis of continous outcome variable Dear R Users, R Core Team, I have a two dimensional space where I measure a numerical value in two situations at different points. I have measured the change and I would like to test if there are areas in this 2D-space where there is a different amount of change (no change, increase, decrease). I don´t know if it´s better to analyse the data just with the coordinates or if its better to group them in "pixels" (and obtain the mean value for each pixel) and then run the cluster analysis. I would like to know if there is a package/function that allows me to do these calculations.I would also like to know if it could be done in a 3D space (I have collapsed the data to 2D because I don´t have many points. Thanks in advance J Toledo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
Dear Jim, 17.03.2011 20:54, Jim Silverton wrote: I have a matrix say: 23 1 12 12 00 0 1 0 1 0 2 23 2 I want to count of number of distinct rows and the number of disinct element in the second column and put these counts in a column. SO at the end of the day I should have: c(1, 1, 1, 2, 2, 1, 1) for the distinct rows... Let's suppose my.data is your data frame, "var" is the 1st column and "var1" is the second. 1) Create a 3rd columns for the first task: my.data$var2<-0 2) Count distinct rows: for (i in 1:nrow(my.data)) { my.data$var2[i]<-nrow(subset(my.data, var==var[i] & var1==var1[i])) } After this, the output of "my.data$var2" is: [1] 1 1 1 2 2 1 1 > ... and c(1, 1, 1, 2, 2, 2, 2) for the counts of how many times the > elements in the second column exists. Here I'm a bit irritated. Shouldn't the count for the first element "1" rather be 3, since the number 3 occurs three times... If this is what You are looking for, then the following should work: 1) Create a 4th column for: my.data$var3<-0 2) Count distinct elements in the second column: for (i in 1:nrow(my.data)) { my.data$var3[i]<-sum(my.data$var1==my.data$var1[i]) } After this, the output of "my.data$var3" is: [1] 3 1 1 3 3 2 2 HTH, Kimmo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot, transformation, and ylim
Hey everyone, I'm having a little trouble with ggplot. I have two sets of y-values, one whose range is contained in the other. Due to the nature of the y-values, I wish to scale the y axis with a log base two transformation. Furthermore, I wish to plot the two sets of y-values as boxplots in separate graphs with the same ylim (and thus with the same ticks and tick labels). As of now, I have: ggplot(data, aes(x1, y1))+geom_boxplot()+scale_y_log2() ggplot(data, aes(x2, y2))+geom_boxplot()+scale_y_log2() which successfully creates the boxplot and transforms the y-axis. However, I can't figure out how to change the range of the y-axis so that they're the same in both graphs. Inserting "+scale_y_continuous(limits=c(...))" before or after "+scale_y_log2()" doesn't seem to work. I also tried: qplot(x1,y1,geom="boxplot", ylim=c(...))+scale_y_log2() but my efforts were in vain. Any ideas? Thanks. Cheers, Godwin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot, transformation, and ylim
Hey everyone, Nevermind, I figured it out: ggplot(data, aes(x1, y1))+geom_boxplot()+scale_y_log2(lim=c(...)) :) Cheers, Godwin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
Hi, thank you for your elaborate answer. I downloaded Prof. Dayton's pdf and will read it tomorrow. A friend also told me that our professor said you can actually compare AICs for different distributions. Apparently it's not correct strictly speaking, because of the two different likelihoods, but you can get meaningful information out of it. Did I understand that right? By the way, what is the etiquette way of answering your post? Should I mail to you /and/ the list? Regards, Alex (with a new e-mail address, in case this mixes something up) Am 17.03.2011 09:58, schrieb Rubén Roa: Hi Alexx, I don't see any problem in comparing models based on different distributions for the same data using the AIC, as long as they have a different number of parameters and all the constants are included. For example, you can compare distribution mixture models with different number of components using the AIC. This is one example: Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length Frequency Data. Journal of Biological, Agricultural and Environmental Statistics 15:416-429. Here is another example: www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf Prof. Dayton writes above that one advantage of AIC over hypothesis testing is: "(d) Considerations related to underlying distributions for random variables can be incorporated into the decision-making process rather than being treated as an assumption whose robustness must be considered (e.g., models based on normal densities and on log-normal densities can be compared)." Last, if you read Akaike's theorem you will see there is nothing precluding comparing models built on different distributional models. Here it is: " the expected (over the sample space and the space of parameter estimates) maximum log-likelihood of some data on a working model overshoots the expected (over the sample space only) maximum log-likelihood of the data under the true model that generated the data by exactly the number of parameters in the working model." A remarkable result. Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of Alexx Hardt Sent: Wed 3/16/2011 7:42 PM To: r-help@r-project.org Subject: Re: [R] R² for non-linear model Am 16.03.2011 19:34, schrieb Anna Gretschel: > Am 16.03.2011 19:21, schrieb Alexx Hardt: >> And to be on-topic: Anna, as far as I know anova's are only useful to >> compare a submodel (e.g. with one less regressor) to another model. >> > thanks! i don't get it either what they mean by fortune... It's an R-package (and a pdf [1]) with collected quotes from the mailing list. Be careful with the suggestion to use AIC. If you wanted to compare two models using AICs, you need the same distribution (that is, Verteilungsannahme) in both models. To my knowledge, there is no way to "compare" a gaussian model to an exponential one (except common sense), but my knowledge is very limited. [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf -- alexx@alexx-fett:~$ vi .emacs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting gamm with interaction term
Hi all, I would like to fit a gamm model of the form: Y~X+X*f(z) Where f is the smooth function and With random effects on X and on the intercept. So, I try to write it like this: gam.lme<- gamm(Y~ s(z, by=X) +X, random=list(groups=pdDiag(~1+X)) ) but I get the error message : Error in MEestimate(lmeSt, grps) : Singularity in backsolve at level 0, block 1 When I simply fit a gam model using the formula above, then it works ok. Is it possible to fit such a model with gamm? Thanks a lot! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Counting
On Thu, Mar 17, 2011 at 02:54:49PM -0400, Jim Silverton wrote: > I have a matrix say: > > 23 1 > 12 12 > 00 > 0 1 > 0 1 > 0 2 > 23 2 > > I want to count of number of distinct rows and the number of disinct element > in the second column and put these counts in a column. SO at the end of the > day I should have: > > c(1, 1, 1, 2, 2, 1, 1) for the distinct rows and c(1, 1, 1, 2, 2, 2, 2) for > the counts of how many times the elements in the second column exists. Any > help is greatly appreciated. Hi. I understand the first as follows. For each row compute the number of rows, which are equal to the given one. If this is correct, then the following can be used. a <- cbind( c(23, 12, 0, 0, 0, 0, 23), c(1, 12, 0, 1, 1, 2, 2)) u <- rep(1, times=nrow(a)) ave(u, a[, 1], a[, 2], FUN=sum) [1] 1 1 1 2 2 1 1 I am not sure, whether i understand the second correctly. Can you explain it in more detail? I would expect ave(u, a[, 2], FUN=sum) [1] 3 1 1 3 3 2 2 However, this is different from your expected output. Do you count only consecutive equal elements? Hope this helps. Petr Savicky. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scope and apply-type functions
So, I've been confused by this for a while. If I want to create functions in an apply, it only uses the desired value for the variable if I create a new local variable: > lapply(1:5,function(h){k=h;function(){k}})[[1]]() [1] 1 > lapply(1:5,function(k){function(){k}})[[1]]() [1] 5 > Normally, a function will use values for a variable if the variable is local to the definition of the function: > a = function(x){function(){x}} > a(5) function(){x} > a(5)() [1] 5 > a(6)() [1] 6 So why doesn't this work for apply? Does apply work more like loops, with a shared variable that changes values so that the first function defined actually uses the last value of the variable? > a = list();for (k in 1:5){a = c(a,function(){k})} > a[[1]]() [1] 5 > Or is there something else entirely? -- View this message in context: http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scope and apply-type functions
Hi, I think it's a side effect of lazy evaluation, where you should probably use the ?force like a jedi, lapply(1:5,function(k){force(k) ; function(){k}})[[2]]() HTH, baptiste On 18 March 2011 07:01, jamie.f.olson wrote: > So, I've been confused by this for a while. If I want to create functions in > an apply, it only uses the desired value for the variable if I create a new > local variable: > >> lapply(1:5,function(h){k=h;function(){k}})[[1]]() > [1] 1 >> lapply(1:5,function(k){function(){k}})[[1]]() > [1] 5 >> > > Normally, a function will use values for a variable if the variable is local > to the definition of the function: > >> a = function(x){function(){x}} >> a(5) > function(){x} > >> a(5)() > [1] 5 >> a(6)() > [1] 6 > > > So why doesn't this work for apply? Does apply work more like loops, with a > shared variable that changes values so that the first function defined > actually uses the last value of the variable? > >> a = list();for (k in 1:5){a = c(a,function(){k})} >> a[[1]]() > [1] 5 >> > > > Or is there something else entirely? > > -- > View this message in context: > http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Numeric vector converted mysteriously to characters in data frame?
Thanks Jim, It turns out that the problem was that all columns had been converted to factors. Once I converted them back to numeric variables the code worked fine. If anybody is wondering how, you can do this with the following: mynumber <- as.numeric(levels(myfactor))[myfactor] There are plenty of other threads that describe this process fully. Martin On Thu, Mar 17, 2011 at 3:11 AM, Jim Holtman wrote: > > take a look at what the output of your cbind is; my guess is that it is a > character matrix. you probably want to do > > data.frame(...,...,...) > > without the cbind. > > Sent from my iPad > > On Mar 16, 2011, at 18:12, Martin Ralphs wrote: > > > Dear R Help, > > > > I would be very grateful if somebody could explain why this is happening. I > > am trying to plot a lattice barchart of a vector of numbers with age > > bandings that I have brought together into a data frame. When I plot the > > variables in their raw vector form, it works. When I try to plot them in > > the df, R labels the numeric axis with default factor values and treats the > > numeric vector as character. Why is this? Apologies if this is blindingly > > obvious, I am new to R. > > > > Here is some sample code that demonstrates the problem: > > > > library(lattice) > > > > # Load in some sample data ... > > ageband <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18) > > agenames <- > > c("00-04","05-09","10-14","15-19","20-24","25-29","30-34","35-39", > > "40-44","45-49","50-54","55-59","60-64","65-69","70-74","75-79","80-84","85 > > plus") > > popcount <- > > c(35274,40958,41574,47973,50384,51248,65748,54854,60948,66473,70854, > > 60475,61854,55848,45857,30584,25475,20574) > > > > > > region <- "North East" > > test1 <- as.data.frame(cbind(region,ageband,agenames,popcount)) > > region <- "North West" > > test2 <- as.data.frame(cbind(region,ageband,agenames,popcount)) > > test <- rbind(test1,test2) > > names(test) <- c("GOR","Band","AgeBand","Persons") > > > > # When I plot my numeric data from the df it is treated as character > > vector... > > > > hg1 <- barchart(AgeBand ~ Persons | GOR, > > data=test, > > origin=0, > > layout=c(2,1), > > reference=TRUE, > > xlab="Count (Persons)", > > ylab="Quinary Age Band", > > scales="free") > > > > hg1 > > > > # But when I plot the source vectors in the same way, it works how I want it > > to... (without the region bit, of course) > > hg2 <- barchart(agenames ~ popcount | region, > > origin=0, > > layout=c(2,1), > > reference=TRUE, > > xlab="Count (Persons)", > > ylab="Quinary Age Band", > > scales="free") > > hg2 > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scope and apply-type functions
Try this: lapply(1:5,function(i){i;function()i})[[2]]() or lapply(1:5,function(i){i;function(j=i) j } )[[2]]() ## both give 2 Now try: lapply(1:5,function(i){function(j=i) j } )[[2]]() ## gives 5 ! The problem is that if you do: lapply(1:5,function(h){function(h)h) what you get is a list of functions of the form function(h)h , where the The "h" in this function has nothing to do with the index "h" in the lapply statement, which no longer exists. So the result is an error message saying it can't find h. The top 2 versions specifically put the value of i at the time the ith function is defined into the function environment, so they exist after the lapply call is finished. The 3rd version obviously uses the value of i at the conclusion of the lapply call when the list of functions is returned from the lapply call and they need to find the value of i for their environments. In general, the scoping with lapply gets complicated and, iirc, the experts have said is nonstandard (Experts: please correct if wrong). So the best advice is to try to avoid constructs like this if possible. -- Bert On Thu, Mar 17, 2011 at 11:01 AM, jamie.f.olson wrote: > So, I've been confused by this for a while. If I want to create functions in > an apply, it only uses the desired value for the variable if I create a new > local variable: > >> lapply(1:5,function(h){k=h;function(){k}})[[1]]() > [1] 1 >> lapply(1:5,function(k){function(){k}})[[1]]() > [1] 5 >> > > Normally, a function will use values for a variable if the variable is local > to the definition of the function: > >> a = function(x){function(){x}} >> a(5) > function(){x} > >> a(5)() > [1] 5 >> a(6)() > [1] 6 > > > So why doesn't this work for apply? Does apply work more like loops, with a > shared variable that changes values so that the first function defined > actually uses the last value of the variable? > >> a = list();for (k in 1:5){a = c(a,function(){k})} >> a[[1]]() > [1] 5 >> > > > Or is there something else entirely? > > -- > View this message in context: > http://r.789695.n4.nabble.com/Scope-and-apply-type-functions-tp3385230p3385230.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need to abstract changing name of column within loop
On Thu, Mar 17, 2011 at 10:32 AM, Joshua Ulrich wrote: > On Wed, Mar 16, 2011 at 6:58 PM, jctoll wrote: >> Hi, >> >> I'm struggling to figure out the way to change the name of a column >> from within a loop. The problem is I can't refer to the object by its >> actual variable name, since that will change each time through the >> loop. My xts object is A. >> >>> head(A) >> A.Open A.High A.Low A.Close A.Volume A.Adjusted A.Adjusted.1 >> 2007-01-03 34.99 35.48 34.05 34.30 2574600 34.30 1186780 >> 2007-01-04 34.30 34.60 33.46 34.41 2073700 34.41 1190586 >> 2007-01-05 34.30 34.40 34.00 34.09 2676600 34.09 1179514 >> 2007-01-08 33.98 34.08 33.68 33.97 1557200 33.97 1175362 >> 2007-01-09 34.08 34.32 33.63 34.01 1386200 34.01 1176746 >> 2007-01-10 34.04 34.04 33.37 33.70 2157400 33.70 1166020 >> >> It's column names are: >>> colnames(A) >> [1] "A.Open" "A.High" "A.Low" "A.Close" >> "A.Volume" "A.Adjusted" "A.Adjusted.1" >> >> I want to change the 7th column name: >>> colnames(A)[7] >> [1] "A.Adjusted.1" >> >> I need to do that through a reference to i: >>> i >> [1] "A" >> >> This works: >>> colnames(get(i))[7] >> [1] "A.Adjusted.1" >> >> And this is what I want to change the column name to: >>> paste(i, ".MarketCap", sep = "") >> [1] "A.MarketCap" >> >> But how do I make the assignment? This clearly doesn't work: >> >>> colnames(get(i))[7] <- paste(i, ".MarketCap", sep = "") >> Error in colnames(get(i))[7] <- paste(i, ".MarketCap", sep = "") : >> could not find function "get<-" >> >> Nor does this (it creates a new object "A.Adjusted.1" with a value of >> "A.MarketCap") : >> >> assign(colnames(get(i))[7], paste(i, ".MarketCap", sep = "")) >> >> How can I change the name of that column within my big loop? Any >> ideas? Thanks! >> > I usually make a copy of the object, change it, then overwrite the original: > > tmp <- get(i) > colnames(tmp)[7] <- "foo" > assign(i,tmp) > > Hope that helps. > >> Best regards, >> > Joshua Ulrich | FOSS Trading: www.fosstrading.com Thank you, that does help. It sounds like your way is similar to the way David suggested. At least now I can just resign myself to doing it that way rather than torturing my mind trying to figure out a shorter way. Thanks, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Spatial cluster analysis of continous outcome variable
I attach the data (csv format). There are the 3 coordinates, (but as there are not so many points I wanted two do 3 analysis in each of them collapsing one variable).There are two variables to study I have posted the data as a ratio between both states and as a percentage state between both states. The data are from different samples (and each sample has 3 or 6 measures).Thanks again. > From: marchy...@hotmail.com > To: tintin...@hotmail.com; r-help@r-project.org > Subject: RE: [R] Spatial cluster analysis of continous outcome variable > Date: Thu, 17 Mar 2011 15:20:09 -0400 > > > > > > > > > Did you post your data or hypothetical data? > Usually that helps make your problem more clear and more interesting > ( likely to get a useful response to your post). > > > > > From: tintin...@hotmail.com > To: r-help@r-project.org > Date: Thu, 17 Mar 2011 17:38:14 +0100 > Subject: [R] Spatial cluster analysis of continous outcome variable > > > > Dear R Users, R Core > Team, > I have a two dimensional space where I measure a numerical value in two > situations at different points. I have measured the change and I would like > to test if there are areas in this 2D-space where there is a different amount > of change (no change, increase, decrease). I don´t know if it´s better to > analyse the data just with the coordinates or if its better to group them in > "pixels" (and obtain the mean value for each pixel) and then run the cluster > analysis. I would like to know if there is a package/function that allows me > to do these calculations.I would also like to know if it could be done in a > 3D space (I have collapsed the data to 2D because I don´t have many points. > Thanks in advance > > > > > > J Toledo > [[alternative HTML version deleted]] > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using barplot() with zoo -- names.arg not permitted?
On Thu, Mar 17, 2011 at 10:23:33AM -0400, Gabor Grothendieck wrote: > On Thu, Mar 17, 2011 at 9:38 AM, David Wolfskill wrote: > ... > > But the X-axis labels show up as "large integers" -- the POSIXct values > > are apparently treated as numeric quantities for this purpose. > ... > Please cut this all down to as small a reproducible example as you can > manage and place it all in ***one*** code section that also generates > the data so that we can copy a single section of code from your post > to R. Use dput to convert data to code for this purpose. Thank you for that. :-) The data below is a bit larger than absolutely necessary, but its much smaller than what I would normally be using -- and I was able to paste this into R directly (on a different machine, getting the same results), so I believe it will suffice. As a benefit, after going through this, I did manage to find an invocation that is able to generate an acceptable X-axis -- at least, sometimes. (Changing the size of the X11 window appears to mess things up -- at least.) But this seems rather more complicated than I expected. Anyway, here's a cut/paste of the requested code section: df <- structure(list(time = c(1297278016L, 1297278017L, 1297278018L, 1297278019L, 1297278020L, 1297278021L, 1297278022L, 1297278023L, 1297278024L, 1297278025L, 1297278026L, 1297278027L, 1297278028L, 1297278029L, 1297278030L, 1297278031L, 1297278032L, 1297278033L, 1297278034L, 1297278035L, 1297278036L, 1297278037L, 1297278038L, 1297278039L, 1297278040L, 1297278041L, 1297278042L, 1297278043L, 1297278044L, 1297278045L, 1297278046L, 1297278047L, 1297278048L, 1297278049L, 1297278050L, 1297278051L, 1297278054L, 1297278055L, 1297278056L, 1297278057L, 1297278058L, 1297278059L, 1297278060L, 1297278061L, 1297278062L, 1297278063L, 1297278064L, 1297278065L, 1297278066L, 1297278067L, 1297278068L, 1297278069L, 1297278070L, 1297278071L, 1297278072L, 1297278073L, 1297278074L, 1297278075L, 1297278076L, 1297278077L, 1297278078L, 1297278079L, 1297278080L, 1297278081L, 1297278082L, 1297278083L, 1297278084L, 1297278085L, 1297278086L, 1297278087L, 1297278088L, 1297278089L, 1297278090L, 1297278091L, 1297278092L, 1297278093L, 1297278094L, 1297278095L, 1297278096L, 1297278097L, 1297278098L, 1297278099L, 1297278100L, 1297278101L, 1297278102L), kern.cp_time_idle = c(3193265L, 3193540L, 3193771L, 3194037L, 3194267L, 3194458L, 3194572L, 3194602L, 3194602L, 3194602L, 3194602L, 3194602L, 3194603L, 3194607L, 3194617L, 3194672L, 3194802L, 3194933L, 3195067L, 3195225L, 3195378L, 3195378L, 3195418L, 3195418L, 3195418L, 3195419L, 3195419L, 3195421L, 3195422L, 3195422L, 3195422L, 3195422L, 3195422L, 3195450L, 3195450L, 3195664L, 3196432L, 3196440L, 3196871L, 3196872L, 3196874L, 3196944L, 3196944L, 3197037L, 3197130L, 3197130L, 3197132L, 3197159L, 3197159L, 3197162L, 3197166L, 3197236L, 3197367L, 3197367L, 3197497L, 3197639L, 3197782L, 3197922L, 3197946L, 3197947L, 3197954L, 3197954L, 3197954L, 3198194L, 3198194L, 3198204L, 3198205L, 3198256L, 3198265L, 3198269L, 3198269L, 3198269L, 3198269L, 3198269L, 3198270L, 3198270L, 3198270L, 3198270L, 3198270L, 3198270L, 3198270L, 3198323L, 3198323L, 3198367L, 3198631L), kern.cp_time_intr = c(5142L, 5142L, 5142L, 5143L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5145L, 5146L, 5147L, 5147L, 5147L, 5148L, 5148L, 5149L, 5150L, 5152L, 5152L, 5152L, 5153L, 5153L, 5155L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5156L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L, 5157L), kern.cp_time_nice = c(36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L), kern.cp_time_sys = c(132583L, 132591L, 132608L, 132618L, 132646L, 132667L, 132694L, 132737L, 132784L, 132825L, 132833L, 132881L, 132933L, 132977L, 133017L, 133059L, 133093L, 133121L, 133147L, 133172L, 133210L, 133220L, 133259L, 133302L, 133342L, 133388L, 133431L, 133494L, 133547L, 133557L, 133592L, 133631L, 133673L, 133713L, 133765L, 133785L, 133798L, 133808L, 133816L, 133857L, 133892L, 133951L, 133960L, 134004L, 134043L, 134109L, 134170L, 134206L, 134268L, 134329L, 134396L, 134443L, 134486L, 134501L, 134558L, 134611L, 134654L, 134704L, 134743L, 134789L, 134824L, 134871L, 134881L, 1
[R] A question about list
Hi dear all, It may be a simple question, i have a list output with different number of elements as following; [[1]] [1] 0.86801402 -0.82974691 0.3974 -0.98566707 -4.96576856 -1.32056754 [7] -5.54093319 -0.07600462 -1.34457280 -1.04080125 1.62843297 -0.20473912 [13] 0.30659907 2.66908117 2.53791837 0.53788013 -0.57463077 0.27708874 [19] -2.94233200 1.54565643 -6.83694100 7.21556266 -3.14823536 -1.34590796 [25] 0.78660855 5.53692735 1.22511890 7.65249980 -0.43008997 -0.10536125 [[2]] [1] -2.80332826 0.54414548 4.38232256 -1.38407653 -1.59241491 -1.35509664 [7] 1.04806755 -0.27685465 -1.36671548 -3.16649719 2.17194692 -3.49404253 [13] 4.69102017 2.78297615 0.34565006 1.05954751 1.78836097 -0.80393182 [19] 3.74315304 1.17427902 1.62354686 0.53186688 -6.56519965 -3.39045485 [25] 0.01043676 -0.18857654 -0.57070351 -0.06135564 6.92331269 -1.46544614 [31] -1.65309767 [[3]] [1] 4.1923546 0.6319591 -0.8568113 -3.3115788 -2.4166481 -1.1543074 [7] -0.9333245 0.2632038 -0.6909956 -3.1008763 -2.9557687 1.5382464 [13] 1.2713290 6.6527302 1.0433603 -0.9916190 -2.7724673 -1.6554250 [19] 1.8023591 -1.5101793 1.2604704 -0.2853326 -2.4312827 -0.4731487 [25] 3.5061061 1.7392190 1.5493419 -0.7203778 -0.6995221 2.7686406 [31] 6.1813364 -1.8665294 I want to compute the percentile of all of 94 elements, i tried quantile(data) but i got an error like Error in sort.list(x, partial = unique(c(lo, hi))) : 'x' must be atomic for 'sort.list' Have you called 'sort' on a list? Can some one help me abouth this ? Thanks for any idea... -- View this message in context: http://r.789695.n4.nabble.com/A-question-about-list-tp3385711p3385711.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Autocorrelation in non-linear regression model
Hey all! I am working on my master thesis and I am desperate with my model. It looks as following: Y(t) = β1*X1(t) + β2*X2(t) + δ*(β1*((1+c)/(δ+c))+β2)*IE(t) - β2*α*((1+c)/(δ+c))*(δ+g)* IE(t-1) note: c and g is a constant value The problem I encounter is that between IE(t) and IE(t-1) there is strong linear correlation (autocorrelation). How can I solve this problem? Of utterly importance is to have finally a significant coefficient δ and α which is than used for a consecutive model. However, I get either no significant values for δ and α, or for one of the two some unrealistic values. Is there an option to combine both in using some non-linear time lagged model, time series or plugged in autoregression? A following up question would be how to place penalties for this model. I would like to restrict values for δ and α between 0 and 0.5 and add penalties when they come closer to the boundaries. I really need some help. Because I am stuck with it for the last two weeks and don't know how to go about it. Thanks for the support Cheers, Bob -- View this message in context: http://r.789695.n4.nabble.com/Autocorrelation-in-non-linear-regression-model-tp3385647p3385647.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Histograms with strings,
Hello, Thanks in advance for any help, I have read a CSV file in which there is a column for an IP addr as in: tmpInFile$V2 [1] "74.125.224.38" "74.125.224.38" "129.46.71.19" "129.46.71.19" [5] "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" [9] "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" If I want to find the IP addr that has the highest occurrence (129.46.71.19, in this case), is there a simple way to do this? Thanks Shank [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with plotting a line that is multicoloured based on levels of a factor
Hi All, I'm trying to plot data that is a time series of flows that are associated with a specific level, and I would like each level to represent a colour in a line plot. Here is some data that approximates what I'm using: date=c(1:300) flow=sin(2*pi/53*c(1:300)) levels=c(rep(c("high","med","low"),100)) data=cbind.data.frame(date, flow, levels) the "levels" column represents the levels of flow. What I've done so far is to plot this data using coloured points corresponding with each flow level: colour=ifelse(data$levels=="high","red", ifelse(data$levels=="med","green", ifelse(data$levels=="low","blue",""))) plot(date, flow, col=colour) What I would like to do instead is to plot the line of this data, not the points. i.e., plot(date, flow, type="l") But I would like the colour of the line to change with each level, i.e., plot(date, flow, type="l", col=colour) But this doesn't work because the line is continuous and the colours are discrete. I looked into using clipplot, but I'm not sure how I would specify limits that would give different sections of the line correct colours. Does anyone know of a way to draw a line with different colours? One way I thought of was to plot each level of flow separately and then build the plot up, i.e., plot(data$date[data$levels=="high"], data$flow[data$levels=="high"], col="red", type="l") lines(data$date[data$levels=="med"], data$flow[data$levels=="med"], col="green", type="l") lines(data$date[data$levels=="low"], data$flow[data$levels=="low"], col="blue", type="l") But the line fills in data gaps, so this doesn't work. Any help would be much appreciated! Thank you. -Pam Allen pal...@hatfieldgroup.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with Time Series Plot
Dear List, This is an embarrassing question, but I can seem to make this work How do I change the font size on the xlab and on the numbers shown in the x-axis on the time series plot below. The arguments cex.lab and cex.axis do not seem to be 'passing' to the plot function. plot(ts(rnorm(100), start=2004, freq=12), ylab="RQI", xlab="My X lab", col="black", cex.lab=0.1, cex.axis=0.7) Thanks, Axel. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Strange R squared, possible error
Thank you for our reply. It's a pity, that 2 variables defined by different formula have same name. If the variables had been named differently, I wouldn't have problem at all and it looks like it's done on purpose. Because I test a quality of data (performance of collecting data) not a model which I know a priori, R^2 gives me simple information of that quality even for nonlinear model. Still when testing different models R^2 shows nothing important. I didn't stated that negative R^2 is meaningless. It has meaning that the model or the data are wrong. -- View this message in context: http://r.789695.n4.nabble.com/Strange-R-squared-possible-error-tp3382818p3385837.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] A question about list
?unlist quantile(unlist(data)) > Hi dear all, > > It may be a simple question, i have a list output with different number of > elements as following; > > [[1]] > [1] 0.86801402 -0.82974691 0.3974 -0.98566707 -4.96576856 > -1.32056754 > [7] -5.54093319 -0.07600462 -1.34457280 -1.04080125 1.62843297 > -0.20473912 > [13] 0.30659907 2.66908117 2.53791837 0.53788013 -0.57463077 > 0.27708874 > [19] -2.94233200 1.54565643 -6.83694100 7.21556266 -3.14823536 > -1.34590796 > [25] 0.78660855 5.53692735 1.22511890 7.65249980 -0.43008997 > -0.10536125 > > [[2]] > [1] -2.80332826 0.54414548 4.38232256 -1.38407653 -1.59241491 > -1.35509664 > [7] 1.04806755 -0.27685465 -1.36671548 -3.16649719 2.17194692 > -3.49404253 > [13] 4.69102017 2.78297615 0.34565006 1.05954751 1.78836097 > -0.80393182 > [19] 3.74315304 1.17427902 1.62354686 0.53186688 -6.56519965 > -3.39045485 > [25] 0.01043676 -0.18857654 -0.57070351 -0.06135564 6.92331269 > -1.46544614 > [31] -1.65309767 > > [[3]] > [1] 4.1923546 0.6319591 -0.8568113 -3.3115788 -2.4166481 -1.1543074 > [7] -0.9333245 0.2632038 -0.6909956 -3.1008763 -2.9557687 1.5382464 > [13] 1.2713290 6.6527302 1.0433603 -0.9916190 -2.7724673 -1.6554250 > [19] 1.8023591 -1.5101793 1.2604704 -0.2853326 -2.4312827 -0.4731487 > [25] 3.5061061 1.7392190 1.5493419 -0.7203778 -0.6995221 2.7686406 > [31] 6.1813364 -1.8665294 > > I want to compute the percentile of all of 94 elements, i tried > quantile(data) but i got an error like > Error in sort.list(x, partial = unique(c(lo, hi))) : > 'x' must be atomic for 'sort.list' > Have you called 'sort' on a list? > Can some one help me abouth this ? > Thanks for any idea... > > -- > View this message in context: > http://r.789695.n4.nabble.com/A-question-about-list-tp3385711p3385711.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Histograms with strings,
try this: > x [1] "74.125.224.38" "74.125.224.38" "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" [7] "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" > table(x) x 129.46.71.19 74.125.224.38 10 2 > which.max(table(x)) 129.46.71.19 1 On Thu, Mar 17, 2011 at 6:06 PM, Khanvilkar, Shashank wrote: > Hello, > Thanks in advance for any help, > > I have read a CSV file in which there is a column for an IP addr as in: > > tmpInFile$V2 > [1] "74.125.224.38" "74.125.224.38" "129.46.71.19" "129.46.71.19" > [5] "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" > [9] "129.46.71.19" "129.46.71.19" "129.46.71.19" "129.46.71.19" > > If I want to find the IP addr that has the highest occurrence (129.46.71.19, > in this case), is there a simple way to do this? > > Thanks > Shank > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.