Hi, I want to parse a string extracting the number of occurrences where two consonants clump together. Consider for example the word "hallo". Here I want the algorithm to return 1. For "chess" if want it to return 2. For the word "screw" the result should be negative as it is a clump of three consonants not two. Also for word "abstraction" I do not want the algorithm to detect two times a two consonant cluster. In this case the result should be negative as well as it is four consonants in a row.
str <- "hallo" gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE, extended = TRUE)[[1]] [1] 3 attr(,"match.length") [1] 3 The result is correct. Now I change the word to "hall" str <- "hall" gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE, extended = TRUE)[[1]] [1] -1 attr(,"match.length") [1] -1 Here my expression fails. How can I write a correct regex to do this? I always encounter problems at the beginning or end of a string. Also: str <- "abstraction" gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE, extended = TRUE)[[1]] [1] 4 7 attr(,"match.length") [1] 3 3 This also fails. Thanks in advance, Mark ------------------------------- Mark Heckmann www.markheckmann.de R-Blog: http://ryouready.wordpress.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.