Thank you Jim and Bert for your suggestions. Following is the final version used: ### Original tiny test data from Aldi Kraja, 9.11.2015. ### Purpose: split A into element 1 and 2, not interested on 3d element of A. Assign element one and two to vectors C and D of the same data.frame. ### Do similar work that SAS SCAN function could have done: C=SCAN(x,1":") ; D=SCAN(x,2,":") ; ### Jim Holtman suggested
### temp <- strsplit(x$A, ":") ### x$C <- sapply(temp, '[[', 1) ### x$D <- sapply(temp, '[[', 2) ### Bert Gunter suggested: ### do.call(rbind,strsplit(x[[1]],":"))[,-3] ### Start of script: a full R solution: x <- read.table(text = "A B 1:29439275 0.46773514 5:85928892 0.81283052 10:128341232 0.09332543 1:106024283:ID 0.36307805 3:62707519 0.42657952 2:80464120 0.89125094", header = TRUE, as.is = TRUE) x$A <- as.character(x$A) temp <- strsplit(x$A,":") x$C <- sapply(temp,'[[',1) x$D <- sapply(temp,'[[',2) x$C <- as.numeric(x$C) x$D <- as.numeric(x$D) ### Final results: x ### end of the script # A B C D #1 1:29439275 0.46773514 1 29439275 #2 5:85928892 0.81283052 5 85928892 #3 10:128341232 0.09332543 10 128341232 #4 1:106024283:ID 0.36307805 1 106024283 #5 3:62707519 0.42657952 3 62707519 #6 2:80464120 0.89125094 2 80464120 With best wishes, Aldi On 9/10/2015 1:35 PM, Bert Gunter wrote: > ... > Alternatively, you can avoid the looping (i.e. sapply) altogether by: > > do.call(rbind,strsplit(x[[1]],":"))[,-3] > > > [,1] [,2] > [1,] "1" "29439275" > [2,] "5" "85928892" > [3,] "10" "128341232" > [4,] "1" "106024283" > [5,] "3" "62707519" > [6,] "2" "80464120" > > These can then be added to the existing frame, converted to numeric, etc. > > Cheers, > Bert > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Thu, Sep 10, 2015 at 11:05 AM, jim holtman <jholt...@gmail.com> wrote: >> try this: >> >> >>> x <- read.table(text = "A B >> + 1:29439275 0.46773514 >> + 5:85928892 0.81283052 >> + 10:128341232 0.09332543 >> + 1:106024283:ID 0.36307805 >> + 3:62707519 0.42657952 >> + 2:80464120 0.89125094", header = TRUE, as.is = TRUE) >>> temp <- strsplit(x$A, ":") >>> x$C <- sapply(temp, '[[', 1) >>> x$D <- sapply(temp, '[[', 2) >>> >>> x >> A B C D >> 1 1:29439275 0.46773514 1 29439275 >> 2 5:85928892 0.81283052 5 85928892 >> 3 10:128341232 0.09332543 10 128341232 >> 4 1:106024283:ID 0.36307805 1 106024283 >> 5 3:62707519 0.42657952 3 62707519 >> 6 2:80464120 0.89125094 2 80464120 >> >> >> >> >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. >> >> On Thu, Sep 10, 2015 at 1:46 PM, aldi <a...@wustl.edu> wrote: >> >>> Hi, >>> I have a data.frame x1, of which a variable A needs to be split by >>> element 1 and element 2 where separator is ":". Sometimes could be three >>> elements in A, but I do not need the third element. >>> >>> Since R does not have a SCAN function as in SAS, C=scan(A,1,":"); >>> D=scan(A,2,":"); >>> I am using a combination of strsplit and sapply. If I do not use the >>> index [i] then R captures the full vector . Instead I need row by row >>> capturing the first and the second element and from them create two new >>> variables C and D. >>> Right now as is somehow in the loop i C is captured correctly, but D is >>> missing because the variables AA does not have it. Any suggestions? >>> Thank you in advance, Aldi >>> >>> A B >>> 1:29439275 0.46773514 >>> 5:85928892 0.81283052 >>> 10:128341232 0.09332543 >>> 1:106024283:ID 0.36307805 >>> 3:62707519 0.42657952 >>> 2:80464120 0.89125094 >>> >>> x1<-read.table(file='./test.txt',head=T,sep='\t') >>> x1$A <- as.character(x1$A) >>> >>> for(i in 1:length(x1$A)){ >>> >>> x1$AA[i] <- as.numeric(unlist(strsplit(x1$A[i],':'))) >>> >>> x1$C[i] <- sapply(x1$AA[i],function(x)x[1]) >>> x1$D[i] <- sapply(x1$AA[i],function(x)x[2]) >>> } >>> >>> x1 >>> >>> >>> >>> > x1 >>> A B AA C D >>> 1 1:29439275 0.46773514 1 1 NA >>> 2 5:85928892 0.81283052 5 5 NA >>> 3 10:128341232 0.09332543 10 10 NA >>> 4 1:106024283:ID 0.36307805 1 1 NA >>> 5 3:62707519 0.42657952 3 3 NA >>> 6 2:80464120 0.89125094 2 2 NA >>> >>> >>> -- >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.