[R] Using regular expressions to detect clusters of consonants in a string

Mark Heckmann Tue, 30 Jun 2009 09:08:14 -0700

Hi,

I want to parse a string extracting the number of occurrences where two
consonants clump together. Consider for example the word "hallo". Here I
want the algorithm to return 1. For "chess" if want it to return 2. For the
word "screw" the result should be negative as it is a clump of three
consonants not two. Also for word "abstraction" I do not want the algorithm
to detect two times a two consonant cluster. In this case the result should
be negative as well as it is four consonants in a row.


str <- "hallo"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]

[1] 3
attr(,"match.length")
[1] 3

The result is correct. Now I change the word to "hall"

str <- "hall"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]

[1] -1
attr(,"match.length")
[1] -1

Here my expression fails. How can I write a correct regex to do this? I
always encounter problems at the beginning or end of a string.

Also:

str <- "abstraction"
gregexpr("[bcdfghjklmnpqrstvwxyz]{2}[aeiou]{1}" , str, ignore.case =TRUE,
extended = TRUE)[[1]]

[1] 4 7
attr(,"match.length")
[1] 3 3

This also fails.

Thanks in advance,
Mark

-------------------------------
Mark Heckmann
www.markheckmann.de
R-Blog: http://ryouready.wordpress.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Using regular expressions to detect clusters of consonants in a string

Reply via email to