I appreciate all the feedback on this. I ended up using this line to solve my problem, just because I stumbled upon it first...
> alldata <- alldata[alldata$REC.TYPE == "SAO " | alldata$REC.TYPE == > "FM-15",,drop=FALSE] But I think Jim's solution would work equally as well. I was a bit confused by the relative complexity of the data frames solution, as it seems like more steps than necessary. Thanks again for the input! -Matt Again, thanks for the feedback! --- On Sun, 3/3/13, arun <smartpink...@yahoo.com> wrote: > From: arun <smartpink...@yahoo.com> > Subject: Re: [R] Help searching a matrix for only certain records > To: "Matt Borkowski" <mathias1...@yahoo.com> > Cc: "R help" <r-help@r-project.org>, "jim holtman" <jholt...@gmail.com> > Date: Sunday, March 3, 2013, 1:29 PM > HI, > You could also use ?data.table() > > n<- 300000 > set.seed(51) > mat1<- as.matrix(data.frame(REC.TYPE= > sample(c("SAO","FAO","FL-1","FL-2","FL-15"),n,replace=TRUE),Col2=rnorm(n),Col3=runif(n),stringsAsFactors=FALSE)) > dat1<- as.data.frame(mat1,stringsAsFactors=FALSE) > table(mat1[,1]) > # > # FAO FL-1 FL-15 FL-2 SAO > #60046 60272 59669 59878 60135 > system.time(x1 <- subset(mat1, grepl("(SAO|FL-15)", > mat1[, "REC.TYPE"]))) > #user system elapsed > # 0.076 0.004 0.082 > system.time(x2 <- subset(mat1, mat1[, "REC.TYPE"] %in% > c("SAO", "FL-15"))) > # user system elapsed > # 0.028 0.000 0.030 > > system.time(x3 <- mat1[match(mat1[, "REC.TYPE"] > , > c("SAO", "FL-15") > , > nomatch = 0) != 0 > ,, > drop = FALSE] > ) > #user system elapsed > # 0.028 0.000 0.028 > table(x3[,1]) > # > #FL-15 SAO > #59669 60135 > > > library(data.table) > > dat2<- data.table(dat1) > system.time(x4<- dat2[match(REC.TYPE,c("SAO", > "FL-15"),nomatch=0)!=0,,drop=FALSE]) > # user system elapsed > #0.024 0.000 0.025 > table(x4$REC.TYPE) > > #FL-15 SAO > #59669 60135 > A.K. > > > > > > > > > ----- Original Message ----- > From: jim holtman <jholt...@gmail.com> > To: Matt Borkowski <mathias1...@yahoo.com> > Cc: "r-help@r-project.org" > <r-help@r-project.org> > Sent: Sunday, March 3, 2013 11:52 AM > Subject: Re: [R] Help searching a matrix for only certain > records > > If you are using matrices, then here is several ways of > doing it for > size 300,000. You can determine if the difference of 0.1 > seconds is > important in terms of the performance you are after. It is > taking you > more time to type in the statements than it is taking them > to execute: > > > n <- 300000 > > testdata <- matrix( > + sample(c("SAO ", "FL-15", "Other"), n, TRUE, > prob = c(1,2,1000)) > + , nrow = n > + , dimnames = list(NULL, "REC.TYPE") > + ) > > table(testdata[, "REC.TYPE"]) > > FL-15 Other SAO > 562 299151 287 > > system.time(x1 <- subset(testdata, grepl("(SAO > |FL-15)", testdata[, "REC.TYPE"]))) > user system elapsed > 0.17 0.00 0.17 > > system.time(x2 <- subset(testdata, testdata[, > "REC.TYPE"] %in% c("SAO ", "FL-15"))) > user system elapsed > 0.05 0.00 0.05 > > system.time(x3 <- testdata[match(testdata[, > "REC.TYPE"] > + , c("SAO ", > "FL-15") > + , nomatch = > 0) != 0 > + ,, drop = > FALSE] > + ) > user system elapsed > 0.03 0.00 0.03 > > identical(x1, x2) > [1] TRUE > > identical(x2, x3) > [1] TRUE > > > > > On Sun, Mar 3, 2013 at 11:22 AM, Jim Holtman <jholt...@gmail.com> > wrote: > > there are way "more efficient" ways of doing many of > the operations , but you probably won't see any differences > unless you have very large objects (several hunfred thousand > entries), or have to do it a lot of times. My background > is in computer performance and for the most part I have > found that the easiest/mostbstraight forward ways are fine > most of the time. > > > > a more efficient way might be: > > > > testdata <- testdata[match(c('SAO ', 'FL-15'), > testdata$REC.TYPE), ] > > > > you can always use 'system.time' to determine how long > actions take. > > > > for multiple comparisons use %in% > > > > Sent from my iPad > > > > On Mar 3, 2013, at 9:22, Matt Borkowski <mathias1...@yahoo.com> > wrote: > > > >> Thank you for your response Jim! I will give this > one a try! But a couple followup questions... > >> > >> In my search for a solution, I had seen something > stating match() is much more efficient than subset() and > will cut down significantly on computing time. Is there any > truth to that? > >> > >> Also, I found the following solution which works > for matching a single condition, but I couldn't quite figure > out how to modify it it to search for both my acceptable > conditions... > >> > >>> testdata <- testdata[testdata$REC.TYPE == > "SAO",,drop=FALSE] > >> > >> -Matt > >> > >> > >> > >> > >> --- On Sun, 3/3/13, jim holtman <jholt...@gmail.com> > wrote: > >> > >> From: jim holtman <jholt...@gmail.com> > >> Subject: Re: [R] Help searching a matrix for only > certain records > >> To: "Matt Borkowski" <mathias1...@yahoo.com> > >> Cc: r-help@r-project.org > >> Date: Sunday, March 3, 2013, 8:00 AM > >> > >> Try this: > >> > >> dataset <- subset(dataset, grepl("(SAO |FL-15)", > REC.TYPE)) > >> > >> > >> On Sun, Mar 3, 2013 at 1:11 AM, Matt Borkowski > <mathias1...@yahoo.com> > wrote: > >>> Let me start by saying I am rather new to R and > generally consider myself to be a novice programmer...so > don't assume I know what I'm doing :) > >>> > >>> I have a large matrix, approximately 300,000 x > 14. It's essentially a 20-year dataset of 15-minute data. > However, I only need the rows where the column I've named > REC.TYPE contains the string "SAO " or "FL-15". > >>> > >>> My horribly inefficient solution was to search > the matrix row by row, test the REC.TYPE column and > essentially delete the row if it did not match my criteria. > Essentially... > >>> > >>>> j <- 1 > >>>> for (i in 1:nrow(dataset)) { > >>>> if(dataset$REC.TYPE[j] != "SAO > " && dataset$RECTYPE[j] != "FL-15") { > >>>> dataset <- dataset[-j,] > } > >>>> else { > >>>> j <- j+1 } > >>>> } > >>> > >>> After watching my code get through only about > 10% of the matrix in an hour and slowing with every row...I > figure there must be a more efficient way of pulling out > only the records I need...especially when I need to repeat > this for another 8 datasets. > >>> > >>> Can anyone point me in the right direction? > >>> > >>> Thanks! > >>> > >>> Matt > >>> > >>> ______________________________________________ > >>> R-help@r-project.org > mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, > reproducible code. > >> > >> > >> > >> -- > >> Jim Holtman > >> Data Munger Guru > >> > >> What is the problem that you are trying to solve? > >> Tell me what you want to do, not how you want to do > it. > >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > ______________________________________________ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible > code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.