Hi r-help-boun...@r-project.org napsal dne 14.11.2011 14:54:05:
> Thank you Sarah, > > Your reply was very helpful. I have the added difficulty that I am not only > dealing with single A-Z characters, but quite often have the following > situation: > > form<-c('~Sentence+LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/ > benefit1+product+action+mean+CTA*help') > > and again, I need to remove the +'CTA*help' part of the character string. > However, in another instance I may have > > form<-c('~Sentence*LEGAL+Intro+Intro/Intro1+Intro*LEGAL+benefit+benefit/ > benefit1+product+action+mean+CTA*help') > > > In this case I would need to remove 'Sentence*LEGAL+' from form. > > > Can this be accomplished in the same manner? Hm. I am not at all an expert in regular expressions but recently I learned some ways (thanks Uwe) sub("^(~)\\+(.+)\\+$", "\\1\\2", gsub("[[:alnum:]]+\\*[[:alnum:]]+", "", form)) [1] "~Intro+Intro/Intro1++benefit+benefit/benefit1+product+action+mean" this will remove all values xxxxxx*yyyyy from your form together with leading and trailing + I wonder if any automatic process can remove only one from several xxxxxx*yyyyy substrings. Regards Petr PS and still it is not perfect as there is one middle + more. > > Many thanks, once again, for your help > > Mike Griffiths > > > > On Mon, Nov 14, 2011 at 12:09 PM, Sarah Goslee <sarah.gos...@gmail.com>wrote: > > > Hi, > > > > On Mon, Nov 14, 2011 at 4:20 AM, Michael Griffiths > > <griffi...@upstreamsystems.com> wrote: > > > Good morning R list, > > > > > > My apologies if this has *already* answered elsewhere, but I have not > > found > > > the answer that I am looking for. > > > > > > I have a character string, i.e. > > > > > > > > > form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M') > > > > > > Now, my aim is to find the position of all those instances of '*' and to > > > remove said '*'. However, I would also like to remove the preceding > > > variable name before the '*', the math operator preceding this, and also > > > the variable name after the '*'. So, here I would like to remove '+L*M' > > > > You just want to get rid of them? gsub() it is. > > > > I've changed your formula a little bit to better demonstrate what's going > > on: > > > form<-c('~ A + B * C + C / D + E + E / F * G + H + I + J + K + L * M') > > > gsub(" \\+ [A-Z] \\* [A-Z]", "", form) > > [1] "~ A + C / D + E + E / F * G + H + I + J + K" > > > > That regular expression will take out a > > space > > + > > any capital letter > > space > > * > > space > > any capital letter. > > > > It will take out all occurrences of that sequence, but won't take out > > occurrences of * not in that sequence. > > > > If you don't want the spaces, you don't need them. Just take them out > > of the regular expression as well. > > > > Not that strsplit() was remotely the right tool here, but you can > > split into characters without a separator: > > > form <- 'abcd' > > > strsplit(form, '') > > [[1]] > > [1] "a" "b" "c" "d" > > > > Sarah > > > > > So, far I have come up with the following code: > > > > > > parts<-strsplit(form,' ') > > > index<-which(unlist(parts)=="*") > > > for (i in 1:length(index)){ > > > parts[[1]][index[i]]<-list(NULL) > > > parts[[1]][index[i]+1]<-list(NULL) > > > parts[[1]][index[i]-1]<-list(NULL) > > > parts[[1]][index[i]-2]<-list(NULL) > > > } > > > new.form<-unlist(parts) > > > > > > form<-new.form[0] > > > for (i in 1: length(new.form)){ > > > form<-paste(form,new.form[i], sep="") > > > } > > > > > > However, as you can see, I have had to use strsplit in, what I consider a > > > rather clumsy manner, as the character string (form) has to be in a > > certain > > > format. All variables and maths operators require a space between them in > > > order for strsplit to work in the manner I require. > > > > > > I would very much like to accomplish what the above code already does, > > but > > > without the need for the initial character string having the need for the > > > aforementioned spaces. > > > > > > If the list can offer help, I would be most appreciative. > > > > > > Yours > > > > > > Mike Griffiths > > > > > > > > > > > -- > > Sarah Goslee > > http://www.functionaldiversity.org > > > > > > -- > > *Michael Griffiths, Ph.D > *Statistician > > *Upstream Systems* > > 8th Floor > Portland House > Bressenden Place > SW1E 5BH > > <http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com% > 2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> > > Tel +44 (0) 20 7869 5147 > Fax +44 207 290 1321 > Mob +44 789 4944 145 > > www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F% > 2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> > > *griffi...@upstreamsystems.com <einst...@upstreamsystems.com>* > > <http://www.upstreamsystems.com/> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.