Re: [R] help with regexp

Jannis Thu, 06 Oct 2011 11:03:38 -0700

Thanks to all who replied! With all these possible solutions it will be hard to 
find the best one :-).


--- Gabor Grothendieck <ggrothendi...@gmail.com> schrieb am Mi, 5.10.2011:

> Von: Gabor Grothendieck <ggrothendi...@gmail.com>
> Betreff: Re: [R] help with regexp
> An: "Jannis" <bt_jan...@yahoo.de>
> CC: r-h...@stat.math.ethz.ch
> Datum: Mittwoch, 5. Oktober, 2011 15:13 Uhr
> On Wed, Oct 5, 2011 at 7:56 AM,
> Jannis <bt_jan...@yahoo.de>
> wrote:
> > Dear list memebers,
> >
> >
> > I am stuck with using regular expressions.
> >
> >
> > Imagine I have a vector of character strings like:
> >
> > test <- c('filename_1_def.pdf',
> 'filename_2_abc.pdf')
> >
> > How could I use regexpressions to extract only the
> 'def'/'abc' parts of these strings?
> >
> >
> > Some try from my side yielded no results:
> >
> > testresults <-
> grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl =
> TRUE, value = TRUE)
> >
> > Somehow I seem to miss some important concept here.
> Until now I always used nested sub expressions like:
> >
> > testresults <- sub('.pdf$', '',
> sub('^filename_[[:digit:]]_', '' , test))
> >
> >
> > but this tends to become cumbersome and I was
> wondering whether there is a more elegant way to do this?
> >
> 
> Here are a couple of solutions:
> 
> # remove everything up to _b as well as everything from .
> onwards
> gsub(".*_|[.].*", "", test)
> 
> # extract everything that is not a _ provided it is
> immediately followed by .
> library(gsubfn)
> strapply(test, "([^_]+)[.]", simplify = TRUE)
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with regexp

Reply via email to