On Tue, 2010-06-01 at 16:43 -0400, Erik Iverson wrote: > > McGehee, Robert wrote: > > R-help, > > Sorry if this is more of a regex question than an R question. However, > > help would be appreciated on my use of the regexpr function. > > > > In the first example below, I ask for all characters (a-z) in 'abc123'; > > regexpr returns a 3-character match beginning at the first character. > > > >> regexpr("[[:alpha:]]*", "abc123") > > [1] 1 > > attr(,"match.length") > > [1] 3 > > > > However, when the text is flipped regexpr, and I ask for a match of all > > characters in '123abc', regexpr returns a zero-character match beginning > > at the first character. Can someone explain what a zero length match > > means (i.e. why not return -1), and why the result isn't 4, > > match.length=3? > > It means it matches 0 characters, which is fine since you use *, which > means match 0 or more occurrences of the regex. It sounds like you want > + instead of *. Also see gregexpr.
Also, regular expressions try to match as early as possible. That's why the match is at position one of length zero, and not at position four of length three. Matt Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina > > > >> regexpr("[[:alpha:]]*", "123abc") > > [1] 1 > > attr(,"match.length") > > [1] 0 > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.