John Fox wrote: > By coincidence, I have a version of strsplit() that I've used to > illustrate recursion: > > Strsplit <- function(x, split){ > if (length(x) > 1) { > return(lapply(x, Strsplit, split)) # vectorization > } > result <- character(0) > if (nchar(x) == 0) return(result) > posn <- regexpr(split, x) > if (posn <= 0) return(x) > c(result, substring(x, 1, posn - 1), > Recall(substring(x, posn+1, nchar(x)), split)) # recursion > } > >
well, it is both inefficient and wrong. inefficient because of the non-tail recursion and recursive concatenation, which is justified for the sake the purpose of showing recursion, but for practical purposes you'd rather use gregexepr. wrong because of how you pick the remaining part of the string to be split -- it works just under the assumption the pattern is a single character: Strsplit("hello-dolly,--sweet", "--") # the pattern is *two* hyphens # [1] "hello-dolly" "-sweet" Strsplit("hello dolly", "") # the pattern is the empty string # [1] "" "" "" "" "" "" "" "" "" "" "" here's a quick rewrite -- i haven't tested it on extreme cases, it may not be perfect, and there's a hidden source of inefficiency here as well: strsplit = function(strings, split) { positions = gregexpr(split, strings) lapply(1:length(strings), function(i) substring(strings[[i]], c(1, positions[[i]] + attr(positions[[i]], "match.length")), c(positions[[i]]-1, nchar(strings[[i]])))) } n = 1000; m = 100 strings = replicate(n, paste(sample(c(letters, " "), 100, replace=TRUE), collapse="")) system.time(replicate(m, strsplit(strings, " "))) system.time(replicate(m, Strsplit(strings, " "))) vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.