Re: [R] Regex Question: return digits after particular letters

David Winsemius Thu, 02 Jun 2011 12:34:14 -0700


On Jun 2, 2011, at 2:54 PM, Ben Ganzfried wrote:

Hi,
First of all, I would like to introduce myself as I will probablyhave manyquestions over the next few weeks and want to thank you guys inadvance foryour help. I'm a cancer researcher and I need to learn R tocomplete a few
projects.  I have an introductory background in Python.
My questions at the moment are based on the following sample inputfile:
*Sample_Input_File*
characteristics_ch1.3  Stage: T1N0  Stage: T2N1  Stage: T0N0  Stage:
T1N0  Stage:
T0N3

I haven't quite figured out what your structure really is, and forthat you should learn to post the output of dput() on the R object...but see if this helps:

> stg <- c('Stage: T1N0', 'Stage: T2N1', 'Stage: T0N0', 'Stage:T1N0', 'Stage: T0N3')

> Tstg <- sub(".*T(\\d)N.", "\\1", stg)
> Tstg
#[1] "1" "2" "0" "1" "0"
> Nstg <- sub(".*T\\dN(\\d)", "\\1", stg)
> Nstg
#[1] "0" "1" "0" "0" "3"

"characteristics_ch1.3" is a column header in  the input excel file.

"T's" represent stage and "N's" represent degree of disease spreading.

I want to create output that looks like this:
*Sample_Output_File*
T     N
1     0
2     1
0     0
1     0
0     3

As it currently stands, my code is the following:

# rm(list=ls())

####----
AND PLEASE DON"T POST THAT CODE WITHOUT A COMMENT.

I noticed it this time, but it is very aggravating to accidentallywide out hours of work while trying to offer help.

source("../../functions.R")
uncurated <- read.csv("../uncurated/Sample_Input_File_full_pdata.csv",as.is
=TRUE,row.names=1)

##initial creation of curated dataframe
curated <-
initialCuratedDF(rownames(uncurated),template.filename="Sample_Template_File.csv")
##--------------------
##start the mappings
##--------------------


##title -> alt_sample_name
curated$alt_sample_name <- uncurated$title

#T
tmp <- uncurated$characteristics_ch1.3
tmp <- *??????*
curated$T <- tmp


So here Tstg is tmp


#N
tmp <- uncurated$characteristics_ch1.3
tmp <- *??????*
curated$N <- tmp

And Nstg is tmp

write.table(curated, row.names=FALSE,
file="../curated/Sample_Output_File_curated_pdata.txt",sep="\t")

My question is the following:
What code gets me the desired output (replacing the *??????*'sabove)? Iwant to: a) Find the integer value one element to the right of "T";and b)
find the integer value one element to the right of "N".  I've read the
regular expression tutorial for R, but could only figure out how tograb aninteger value if it is the only integer value in the row (ie morethan one
integer value makes this basic regular expression unsuccessful).


Just surround it with a pattern and use the ()  , "\\n" mechanism


Thank you very much for any help you can provide.

Sincerely,

Ben Ganzfried

        [[alternative HTML version deleted]]



David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Regex Question: return digits after particular letters

Reply via email to