Hi all

I have the following regular expression problem: I want to find
complete elements of a vector that end in a repeated character but
where the repetition doesn't make up the whole word. That is, for the
vector vec:

vec<-c("aaaa", "baaa", "bbaa", "bbba", "baamm", "aa")

I would like to get
"baaa"
"bbaa"
"baamm"

>From tools where negative lookbehind can involve variable lengths, one
would think this would work:

grep("(?<!(?:\\1|^))(.)\\1{1,}$", vec, perl=T)

But then R doesn't like it that much ... I also know I can get it like this:

whole.word.rep <- grep("^(.)\\1{1,}$", vec, perl=T) # 1 6
rep.at.end <- grep("(.)\\1{1,}$", vec, perl=T) # 1 2 3 5 6
setdiff(rep.at.end, whole.word.rep) # 2 3 5

But is there a one-line grep thingy to do this?

Thx for any pointers,
STG

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to