Thanks to all who replied! With all these possible solutions it will be hard to find the best one :-).
--- Gabor Grothendieck <ggrothendi...@gmail.com> schrieb am Mi, 5.10.2011: > Von: Gabor Grothendieck <ggrothendi...@gmail.com> > Betreff: Re: [R] help with regexp > An: "Jannis" <bt_jan...@yahoo.de> > CC: r-h...@stat.math.ethz.ch > Datum: Mittwoch, 5. Oktober, 2011 15:13 Uhr > On Wed, Oct 5, 2011 at 7:56 AM, > Jannis <bt_jan...@yahoo.de> > wrote: > > Dear list memebers, > > > > > > I am stuck with using regular expressions. > > > > > > Imagine I have a vector of character strings like: > > > > test <- c('filename_1_def.pdf', > 'filename_2_abc.pdf') > > > > How could I use regexpressions to extract only the > 'def'/'abc' parts of these strings? > > > > > > Some try from my side yielded no results: > > > > testresults <- > grep('(?<=filename_[[:digit:]]_).{1,3}(?=.pdf)', perl = > TRUE, value = TRUE) > > > > Somehow I seem to miss some important concept here. > Until now I always used nested sub expressions like: > > > > testresults <- sub('.pdf$', '', > sub('^filename_[[:digit:]]_', '' , test)) > > > > > > but this tends to become cumbersome and I was > wondering whether there is a more elegant way to do this? > > > > Here are a couple of solutions: > > # remove everything up to _b as well as everything from . > onwards > gsub(".*_|[.].*", "", test) > > # extract everything that is not a _ provided it is > immediately followed by . > library(gsubfn) > strapply(test, "([^_]+)[.]", simplify = TRUE) > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.