On Tue, 2010-07-13 at 01:42 -0400, Hadley Wickham wrote: > strings <- replicate(1e5, paste(sample(letters, 100, rep = T), collapse = > "")) > system.time(strings[-1] == strings[-1e5]) > # user system elapsed > # 0.016 0.000 0.017 > > So it takes ~1/100 of a second to do ~100,000 string comparisons. You > need to provide a reproducible example that illustrates why you think > string comparisons are slow.
Here's a vectorized alternative to '==' for strings, with minimal argument checking or result conversion. I haven't looked at the corresponding R source code, it may be similar: library(inline) code <- " SEXP ans; int i, len, *cans; if(!isString(s1) || !isString(s2)) error(\"invalid arguments\"); len = length(s1)>length(s2)?length(s2):length(s1); PROTECT(ans = allocVector(INTSXP, len)); cans = INTEGER(ans); for(i = 0; i < len; i++) cans[i] = strcmp(CHAR(STRING_ELT(s1,i)),\ CHAR(STRING_ELT(s2,i))); UNPROTECT(1); return ans; " sig <- signature(s1="character", s2="character") strcmp <- cfunction(sig, code) > system.time(strings[-1] == strings[-1e5]) user system elapsed 0.036 0.000 0.035 > system.time(strcmp(strings[-1], strings[-1e5])) user system elapsed 0.032 0.000 0.034 That's pretty fast, though I seem to be working with a slower system than Hadley. It's hard to see how this could be improved, except maybe by caching results of string comparisons. -Matt > > Hadley > > > On Tue, Jul 13, 2010 at 6:52 AM, Ralf B <ralf.bie...@gmail.com> wrote: > > I am asking this question because String comparison in R seems to be > > awfully slow (based on profiling results) and I wonder if perhaps '==' > > alone is not the best one can do. I did not ask for anything > > particular and I don't think I need to provide a self-contained source > > example for the question. So, to re-phrase my question, are there more > > (runtime) effective ways to find out if two strings (about 100-150 > > characters long) are equal? > > > > Ralf > > > > > > > > > > > > > > On Sun, Jul 11, 2010 at 2:37 PM, Sharpie <ch...@sharpsteen.net> wrote: > >> > >> > >> Ralf B wrote: > >>> > >>> What is the fastest way to compare two strings in R? > >>> > >>> Ralf > >>> > >> > >> Which way is not fast enough? > >> > >> In other words, are you asking this question because profiling showed one > >> of > >> R's string comparison operations is causing a massive bottleneck in your > >> code? If so, which one and how are you using it? > >> > >> -Charlie > >> > >> ----- > >> Charlie Sharpsteen > >> Undergraduate-- Environmental Resources Engineering > >> Humboldt State University > >> -- > >> View this message in context: > >> http://r.789695.n4.nabble.com/Fast-string-comparison-tp2285156p2285409.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina http://biostatmatt.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.