Thank you very much for your help. It saved me a lot of time and it worked perfectly. I have a quick follow-up as I'm not sure I understand yet why the code works and where it comes from.
For example, in: Tstg <- sub(".*T(\\d)N.", "\\1", tmp) *How exactly does the substitution operation work? *On a high-level, I get that we are taking the values in the vector tmp, and replacing each tmp value with the integer immediately after the "T". But more lower-level, how does ".*T(\\d)N.", "\\1" actually get us there? I'll undoubtedly face similar but different situations many times in the future and I want to make sure that I know how to solve them. Thanks again--I really appreciate your kindness. Ben Ganzfried On Thu, Jun 2, 2011 at 3:33 PM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Jun 2, 2011, at 2:54 PM, Ben Ganzfried wrote: > > Hi, >> >> First of all, I would like to introduce myself as I will probably have >> many >> questions over the next few weeks and want to thank you guys in advance >> for >> your help. I'm a cancer researcher and I need to learn R to complete a >> few >> projects. I have an introductory background in Python. >> >> My questions at the moment are based on the following sample input file: >> *Sample_Input_File* >> characteristics_ch1.3 Stage: T1N0 Stage: T2N1 Stage: T0N0 Stage: >> T1N0 Stage: >> T0N3 >> >> > I haven't quite figured out what your structure really is, and for that you > should learn to post the output of dput() on the R object... but see if > this helps: > > > stg <- c('Stage: T1N0', 'Stage: T2N1', 'Stage: T0N0', 'Stage: T1N0', > 'Stage: T0N3') > > Tstg <- sub(".*T(\\d)N.", "\\1", stg) > > Tstg > #[1] "1" "2" "0" "1" "0" > > Nstg <- sub(".*T\\dN(\\d)", "\\1", stg) > > Nstg > #[1] "0" "1" "0" "0" "3" > > > "characteristics_ch1.3" is a column header in the input excel file. >> >> "T's" represent stage and "N's" represent degree of disease spreading. >> >> I want to create output that looks like this: >> *Sample_Output_File* >> T N >> 1 0 >> 2 1 >> 0 0 >> 1 0 >> 0 3 >> >> As it currently stands, my code is the following: >> >> > > > # rm(list=ls()) >> > ####---- > AND PLEASE DON"T POST THAT CODE WITHOUT A COMMENT. > > I noticed it this time, but it is very aggravating to accidentally wide out > hours of work while trying to offer help. > > source("../../functions.R") >> >> uncurated <- read.csv("../uncurated/Sample_Input_File_full_pdata.csv", >> as.is >> =TRUE,row.names=1) >> >> ##initial creation of curated dataframe >> curated <- >> >> initialCuratedDF(rownames(uncurated),template.filename="Sample_Template_File.csv") >> >> ##-------------------- >> ##start the mappings >> ##-------------------- >> >> >> ##title -> alt_sample_name >> curated$alt_sample_name <- uncurated$title >> >> #T >> tmp <- uncurated$characteristics_ch1.3 >> tmp <- *??????* >> curated$T <- tmp >> > > So here Tstg is tmp > >> >> #N >> tmp <- uncurated$characteristics_ch1.3 >> tmp <- *??????* >> curated$N <- tmp >> > And Nstg is tmp > > write.table(curated, row.names=FALSE, >> file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t") >> >> My question is the following: >> >> What code gets me the desired output (replacing the *??????*'s above)? I >> want to: a) Find the integer value one element to the right of "T"; and b) >> find the integer value one element to the right of "N". I've read the >> regular expression tutorial for R, but could only figure out how to grab >> an >> integer value if it is the only integer value in the row (ie more than one >> integer value makes this basic regular expression unsuccessful). >> > > Just surround it with a pattern and use the () , "\\n" mechanism > >> >> Thank you very much for any help you can provide. >> >> Sincerely, >> >> Ben Ganzfried >> >> [[alternative HTML version deleted]] >> > > > David Winsemius, MD > West Hartford, CT > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.