Dear Wacek, "Wrong" is a bit strong, I think -- limited to single-pattern characters is more accurate. Moreover, it isn't hard to make the function work with multiple-character matches as well:
Strsplit <- function(x, split){ if (length(x) > 1) { return(lapply(x, Strsplit, split)) # vectorization } result <- character(0) if (nchar(x) == 0) return(result) posn <- regexpr(split, x) if (posn <= 0) return(x) c(result, substring(x, 1, posn - 1), Recall(substring(x, posn + attr(posn, "match.length"), nchar(x)), split)) # recursion } On the other hand, your function is much more efficient. Regards, John ------------------------------ John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox > -----Original Message----- > From: Wacek Kusnierczyk [mailto:[EMAIL PROTECTED] > Sent: December-04-08 5:05 AM > To: John Fox > Cc: R help > Subject: Re: [R] Strplit code > > John Fox wrote: > > By coincidence, I have a version of strsplit() that I've used to > > illustrate recursion: > > > > Strsplit <- function(x, split){ > > if (length(x) > 1) { > > return(lapply(x, Strsplit, split)) # vectorization > > } > > result <- character(0) > > if (nchar(x) == 0) return(result) > > posn <- regexpr(split, x) > > if (posn <= 0) return(x) > > c(result, substring(x, 1, posn - 1), > > Recall(substring(x, posn+1, nchar(x)), split)) # recursion > > } > > > > > > well, it is both inefficient and wrong. > > inefficient because of the non-tail recursion and recursive > concatenation, which is justified for the sake the purpose of showing > recursion, but for practical purposes you'd rather use gregexepr. > > wrong because of how you pick the remaining part of the string to be > split -- it works just under the assumption the pattern is a single > character: > > Strsplit("hello-dolly,--sweet", "--") > # the pattern is *two* hyphens > # [1] "hello-dolly" "-sweet" > > Strsplit("hello dolly", "") > # the pattern is the empty string > # [1] "" "" "" "" "" "" "" "" "" "" "" > > > here's a quick rewrite -- i haven't tested it on extreme cases, it may > not be perfect, and there's a hidden source of inefficiency here as well: > > strsplit = > function(strings, split) { > positions = gregexpr(split, strings) > lapply(1:length(strings), function(i) > substring(strings[[i]], c(1, positions[[i]] + > attr(positions[[i]], "match.length")), c(positions[[i]]-1, > nchar(strings[[i]])))) > } > > > n = 1000; m = 100 > strings = replicate(n, paste(sample(c(letters, " "), 100, replace=TRUE), > collapse="")) > system.time(replicate(m, strsplit(strings, " "))) > system.time(replicate(m, Strsplit(strings, " "))) > > > vQ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.