Re: [R] Regex Question: return digits after particular letters

Ben Ganzfried Thu, 02 Jun 2011 13:22:55 -0700

Thank you very much for your help.  It saved me a lot of time and it worked
perfectly.  I have a quick follow-up as I'm not sure I understand yet why
the code works and where it comes from.


For example, in: Tstg <- sub(".*T(\\d)N.", "\\1", tmp)

*How exactly does the substitution operation work?

*On a high-level, I get that we are taking the values in the vector tmp, and
replacing each tmp value with the integer immediately after the "T".  But
more lower-level, how does ".*T(\\d)N.", "\\1" actually get us there?  I'll
undoubtedly face similar but different situations many times in the future
and I want to make sure that I know how to solve them.

Thanks again--I really appreciate your kindness.

Ben Ganzfried

On Thu, Jun 2, 2011 at 3:33 PM, David Winsemius <dwinsem...@comcast.net>wrote:

>
> On Jun 2, 2011, at 2:54 PM, Ben Ganzfried wrote:
>
>  Hi,
>>
>> First of all, I would like to introduce myself as I will probably have
>> many
>> questions over the next few weeks and want to thank you guys in advance
>> for
>> your help.  I'm a cancer researcher and I need to learn R to complete a
>> few
>> projects.  I have an introductory background in Python.
>>
>> My questions at the moment are based on the following sample input file:
>> *Sample_Input_File*
>> characteristics_ch1.3  Stage: T1N0  Stage: T2N1  Stage: T0N0  Stage:
>> T1N0  Stage:
>> T0N3
>>
>>
> I haven't quite figured out what your structure really is, and for that you
> should learn to post the output of dput()  on the R object... but see if
> this helps:
>
> > stg <- c('Stage: T1N0',  'Stage: T2N1', 'Stage: T0N0', 'Stage: T1N0',
> 'Stage: T0N3')
> > Tstg <- sub(".*T(\\d)N.", "\\1", stg)
> > Tstg
> #[1] "1" "2" "0" "1" "0"
> > Nstg <- sub(".*T\\dN(\\d)", "\\1", stg)
> > Nstg
> #[1] "0" "1" "0" "0" "3"
>
>
>  "characteristics_ch1.3" is a column header in  the input excel file.
>>
>> "T's" represent stage and "N's" represent degree of disease spreading.
>>
>> I want to create output that looks like this:
>> *Sample_Output_File*
>> T     N
>> 1     0
>> 2     1
>> 0     0
>> 1     0
>> 0     3
>>
>> As it currently stands, my code is the following:
>>
>>
>
>
>  # rm(list=ls())
>>
> ####----
> AND PLEASE DON"T POST THAT CODE WITHOUT A COMMENT.
>
> I noticed it this time, but it is very aggravating to accidentally wide out
> hours of work while trying to offer help.
>
>  source("../../functions.R")
>>
>> uncurated <- read.csv("../uncurated/Sample_Input_File_full_pdata.csv",
>> as.is
>> =TRUE,row.names=1)
>>
>> ##initial creation of curated dataframe
>> curated <-
>>
>> initialCuratedDF(rownames(uncurated),template.filename="Sample_Template_File.csv")
>>
>> ##--------------------
>> ##start the mappings
>> ##--------------------
>>
>>
>> ##title -> alt_sample_name
>> curated$alt_sample_name <- uncurated$title
>>
>> #T
>> tmp <- uncurated$characteristics_ch1.3
>> tmp <- *??????*
>> curated$T <- tmp
>>
>
> So here Tstg is tmp
>
>>
>> #N
>> tmp <- uncurated$characteristics_ch1.3
>> tmp <- *??????*
>> curated$N <- tmp
>>
> And Nstg is tmp
>
>  write.table(curated, row.names=FALSE,
>> file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t")
>>
>> My question is the following:
>>
>> What code gets me the desired output (replacing the *??????*'s above)?  I
>> want to: a) Find the integer value one element to the right of "T"; and b)
>> find the integer value one element to the right of "N".  I've read the
>> regular expression tutorial for R, but could only figure out how to grab
>> an
>> integer value if it is the only integer value in the row (ie more than one
>> integer value makes this basic regular expression unsuccessful).
>>
>
> Just surround it with a pattern and use the ()  , "\\n" mechanism
>
>>
>> Thank you very much for any help you can provide.
>>
>> Sincerely,
>>
>> Ben Ganzfried
>>
>>        [[alternative HTML version deleted]]
>>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex Question: return digits after particular letters

Reply via email to