That actually reminds me of a problem I had to tackle a while ago.
Say I have the following:
txt <- c("Variation_0001 // chr1:1083805-1283805 // Array CGH //
15286789 // Iafrate et al. (2004) // CopyNumber /// Variation_5452 //
chr1:1142956-1147823 // Computational mapping of resequencing
traces // 16902084 // Mills et al. (2006) // CopyNumber",
"Variation_4192 // chr1:2062347-2242269 // Array CGH // 17160897 //
Wong et al. (2007) // CopyNumber /// Variation_4193 //
chr1:2145626-2314237 // Array CGH // 17160897 // Wong et al. (2007) //
CopyNumber /// Variation_8246 // chr1:2224111-3755284 // Affymetrix
500K and 100K SNP Mapping Arrays // 17638019 // Zogopoulos et al.
(2007) // CopyNumber", "Variation_8246 // chr1:2224111-3755284 //
Affymetrix 500K and 100K SNP Mapping Arrays // 17638019 // Zogopoulos
et al. (2007) // CopyNumber")
For each record, I'm interested in keeping the following:
results <- c("Variation_0001;Variation_5452",
"Variation_4192;Variation_4193;Variation_8246", "Variation_8246")
My solution was:
theNames <- function(tmp)
sapply(strsplit(tmp, " /+ "),
function(y)
paste(y[grep("Variation_", y)],
collapse=";"))
But my wish was to know the regular expression that I needed to select
everything but "Variation_\\d+"... For example, something like:
gsub( NOT "Variation_\\d+", ";", txt, perl=TRUE)
Suggestions?
b
On Feb 2, 2008, at 1:03 PM, Peter Dalgaard wrote:
Benilton Carvalho wrote:
help("strsplit")
b
Yes, but...
The postprocessing gets a bit awkward. It might be easier to use
sub() to get rid of the first/last bit of the string i.e.
C2 <- sub("^.*:", "", Col)
C1 <- sub(":.*$", "", Col)
An orthogonal idea is
con <- textConnection("Col")
read.table(con, sep=":")
close(con)
On Feb 2, 2008, at 12:43 PM, joseph wrote:
Hello
I have a data frame and one of its columns is as follows:
Col
chr1:71310034
chr15:37759058
chr22:18262638
chrUn:31337214
chr10_random:4369261
chrUn:3545097
I would like to get rid of colon (:) and replace this column
with two new columns containing the terms on each side of the
colon. The new columns
should look as follows:
Col_a Col_b
chr1 71310034
chr14 23354088
chr15 37759058
chr22 18262638
chrUn 31337214
chr10_random 4369261
chrUn 3545097
Any help will be much appreciated
Joseph
____________________________________________________________________________________
Looking for last minute shopping deals?
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
------------------------------------------------------------------------
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45)
35327918
~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45)
35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.