Well, following up on this observation, which can be put under the heading of "Sometimes vectorization can be much slower than explicit loops" , I offer the following:
firsti <-function(x,k) { i <- 1 while(x[i]<=k){i <- i+1} i } > system.time(for(i in 1:100)which(x>.99)[1]) user system elapsed 19.1 2.4 22.2 > system.time(for(i in 1:100)which.max(x>.99)) user system elapsed 30.45 6.75 37.46 > system.time(for(i in 1:100)firsti(x,.99)) user system elapsed 0.03 0.00 0.03 ## About a 500 - 1000 fold speedup ! > firsti(x,.99) [1] 122 It doesn't seem to scale too badly, either (whatever THAT means!): (Of course, the which() versions are essentially identical in timing, and so are omitted) > system.time(for(i in 1:100)firsti(x,.9999)) user system elapsed 2.70 0.00 2.72 > firsti(x,.9999) [1] 18200 Of course, at some point, the explicit looping is awful -- with k = .999999, the index was about 360000, and the timing test took 54 seconds. So I guess the point is -- as always -- that the optimal approach depends on the nature of the data. Prudence and robustness clearly demands the vectorized which() approaches if you have no information. But if you do know something about the data, then you can often write much faster tailored solutions. Which is hardly revelatory, of course. Cheers to all, Bert On Wed, Sep 19, 2012 at 8:55 AM, Milan Bouchet-Valat <nalimi...@club.fr> wrote: > Le mercredi 19 septembre 2012 à 15:23 +0000, William Dunlap a écrit : >> The original method is faster than which.max for longish numeric vectors >> (in R-2.15.1), but you should check time and memory usage on your >> own machine: >> >> > x <- runif(18e6) >> > system.time(for(i in 1:100)which(x>0.99)[1]) >> user system elapsed >> 11.64 1.05 12.70 >> > system.time(for(i in 1:100)which.max(x>0.99)) >> user system elapsed >> 16.38 2.94 19.35 > If you the probability that such an element appears at the beginning of > the vector, a custom hack might well be more efficient. The problem with > ">", which() and which.max() is that they will go over all the elements > of the vector even if it's not needed at all. So you can start with a > small subset of the vector, and increase its size in a few steps until > you find the value you're looking for. > > Proof of concept (the values of n obviously need to be adapted): > x <-runif(1e7) > > find <- function(x, lim) { > len <- length(x) > > for(n in 2^(14:0)) { > val <- which(x[seq.int(1L, len/n)] > lim) > > if(length(val) > 0) return(val[1]) > } > > return(NULL) > } > >> system.time(for(i in 1:100)which(x>0.999)[1]) > utilisateur système écoulé > 9.740 5.795 15.890 >> system.time(for(i in 1:100)which.max(x>0.999)) > utilisateur système écoulé > 14.288 9.510 24.562 >> system.time(for(i in 1:100)find(x, .999)) > utilisateur système écoulé > 0.017 0.002 0.019 >> find(x, .999) > [1] 1376 > > (Looks almost like cheating... ;-) > > > > > >> Bill Dunlap >> Spotfire, TIBCO Software >> wdunlap tibco.com >> >> >> > -----Original Message----- >> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> > On Behalf >> > Of Jeff Newmiller >> > Sent: Wednesday, September 19, 2012 8:06 AM >> > To: Mike Spam; r-help@r-project.org >> > Subject: Re: [R] effective way to return only the first argument of >> > "which()" >> > >> > ?which.max >> > --------------------------------------------------------------------------- >> > Jeff Newmiller The ..... ..... Go Live... >> > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... >> > Live: OO#.. Dead: OO#.. Playing >> > Research Engineer (Solar/Batteries O.O#. #.O#. with >> > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k >> > --------------------------------------------------------------------------- >> > Sent from my phone. Please excuse my brevity. >> > >> > Mike Spam <ichmags...@googlemail.com> wrote: >> > >> > >Hi, >> > > >> > >I was looking for a function like "which()" but only returns the first >> > >argument. >> > >Compare: >> > > >> > >x <- c(1,2,3,4,5,6) >> > >y <- 4 >> > >which(x>y) >> > > >> > >returns: >> > >5,6 >> > > >> > >which(x>y)[1] >> > >returns: >> > >5 >> > > >> > >which(x>y)[1] is exactly what i need. I did use this but the dataset >> > >is too big (~18 mio. Points). >> > >That's why i need a more effective way to get the first element of a >> > >vector which is bigger/smaller than a specific number. >> > > >> > >I found "match()" but this function only works for equal numbers. >> > > >> > > >> > > >> > >Thanks, >> > >Nico >> > > >> > >______________________________________________ >> > >R-help@r-project.org mailing list >> > >https://stat.ethz.ch/mailman/listinfo/r-help >> > >PLEASE do read the posting guide >> > >http://www.R-project.org/posting-guide.html >> > >and provide commented, minimal, self-contained, reproducible code. >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.