Thanks, Marc and Haris! I didn't know the values of the numbers beforehand, so the scan method won't work, but "[^+-\\d.]+" will do!
And Haris, I didn't intend to keep the information of which number is B, which is C etc when asking the question, as I had a tedious way to do it (use strspilt and unlist over and over again, after I get the number). But if you have a easier way to do it, I'd like to know! Hua --- On Thu, 6/12/08, Charilaos Skiadas <[EMAIL PROTECTED]> wrote: > From: Charilaos Skiadas <[EMAIL PROTECTED]> > Subject: Re: [R] numbers as part of long character > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED], r-help@r-project.org > Date: Thursday, June 12, 2008, 6:03 PM > On Jun 12, 2008, at 5:06 PM, Marc Schwartz wrote: > > > on 06/12/2008 03:46 PM Hua Li wrote: > >> Hi, > >> I'm looking for some way to pick up the > numbers which are > >> contained and buried in a long character. For > example, > >> > outtree.new="(((B:1204.25,E:1204.25):7581.11,F:8785.36):8353.85,C: > > >> 17139.21);" > >> num.char = > unlist(strsplit(unlist(strsplit(unlist(strsplit(unlist > >> (strsplit(unlist(strsplit > >> > (outtree.new,")",fixed=TRUE)),"(",fixed=TRUE)),":",fixed=TRUE)),",",f > > >> ixed=TRUE)),";",fixed=TRUE)) > >> > num.vec=as.numeric(num.char[1:(length(num.char)-1)]) > >> num.char > >> # "B" "1204.25" > "E" "1204.25" > "7581.11" > >> "F" "8785.36" > "8353.85" "C" > "17139.21" "" num.vec > >> # NA 1204.25 NA 1204.25 7581.11 NA > 8785.36 > >> 8353.85 NA 17139.21 > >> would help me get the numbers such as 1204.25, > 7581.11, etc, but > >> with a warning message which reads: > >> "Warning message: > >> NAs introduced by coercion " > >> Is there a way to get around this? Thanks! > >> Hua > > > > Your code above is overly and needlessly complicated, > which makes > > it difficult to debug. > > > > I would take an approach whereby you use gsub() to > strip non- > > numeric characters from the input character vector and > then use scan > > () to read the remaining numbers: > > > > > Vec <- > scan(textConnection(gsub("[^0-9\\.]+", > " ", outtree.new))) > > Read 6 items > > > > > Vec > > [1] 1204.25 1204.25 7581.11 8785.36 8353.85 > 17139.21 > > > > > str(Vec) > > num [1:6] 1204 1204 7581 8785 8354 ... > > > > > > The result of using gsub() above is: > > > > > gsub("[^0-9\\.]+", " > ", outtree.new) > > [1] " 1204.25 1204.25 7581.11 8785.36 8353.85 > 17139.21 " > > > > > > That gives you a character vector which can then be > passed to scan > > () as a textConnection(). > > Another approach would be to split on sequences of > non-integers: > > as.numeric( strsplit(outtree.new, > "[^\\d.]+", perl=TRUE)[[1]] ) > > > Use "[^+-\\d.]+" if your numbers might be > signed. This does assume > that dots, +/- occur only as decimal points. > > Hua, did you want to keep the information of which number > is B, which > is C etc? > > > See ?gsub, ?regex, ?textConnection and ?scan for more > information. > > > > HTH, > > > > Marc Schwartz > > > > Haris Skiadas > Department of Mathematics and Computer Science > Hanover College ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.