Hi all,

I'm trying to do some data manipulation using R, but I'm a bit stuck. I have
to warn you, I'm a real R noob.

I have for example this file:

V1                              V2               V3                      V4     
       
V5                                        V6
1:156706559               rs8658         dbSNP_52       C/G/A   
C=2996/G=7762/A=0        
31.8803/20.2782/27.849
1:69116                   none   none           A/G             A=1/G=611       
                 
0.0/0.2747/0.1634
1:69134                   none   none           G/A             G=8/A=724       
                 
1.9108/0.4785/1.0929
1:69270                   none   none           G/A             G=1896/A=888    
         
10.2394/42.6562/31.8966

The format that I want this data in is:

V1       V2                      V3              V4                     V5      
  V6        V7     V8            
V9
1        156706559      rs8658  dbSNP_52        C         A        2996    0    
         27.849
1        156706559      rs8658  dbSNP_52        G         A        7762    0    
         27.849
1        69116          none    none            A         G        1            
   611   0.1634
1        69134          none    none            G         A        8            
   724   10.929
1        69270          none    none            G         A        1896    888  
 318.966


So first separate column V1 by ":". This was done pretty easily.

After that separate column V4 by "/". This was a bit trickier, seeing as
some rows are longer than others, but I managed to do it with this code.
Probably a really lousy way to do it, but it worked. (Don't pay too much
attention to the column numbers, my original file has more columns)

        splittingAllele <- function(y) {

        #####Splitting Column 4 in Variant and Normal Allele
        r <- strsplit(y$V4, "/")

        d <- NULL
        d <- as.list(d)

        for (x in 1:length(r)) {
            d <- rbind(d, r[[x]][length(r[[x]])])
        }

        d <- as.character(unlist(d))    
        d <- as.data.frame(d)

        y[,28] <- d
        y[,28] <- as.character(y[,28])

        f <- as.data.frame(substr(y[,4], 1, nchar(y[,4])-2))
        test3 <- y[,c(1:3)]
        test3[,4] <- f 
        test3[,5:28] <-y[,c(28,5:27)]

        r <- strsplit(as.character(test3[,4]), "/")
        p1 <- cbind(unlist(r), rep(as.character(test3[,1]), sapply(r,
length)))
        p2 <- cbind(unlist(r), rep(as.character(test3[,2]), sapply(r,
length)))
        p3 <- cbind(unlist(r), rep(as.character(test3[,3]), sapply(r,
length)))
        p5 <- cbind(unlist(r), rep(as.character(test3[,5]), sapply(r,
length)))
        p8 <- cbind(unlist(r), rep(as.character(test3[,8]), sapply(r,
length)))
        p9 <- cbind(unlist(r), rep(as.character(test3[,9]), sapply(r,
length)))

        test4 <- cbind(p1[,2], p2[,2], p3[,2], p3[,1], p5[,2], p8[,2],
p9[,2])
        test4 <- as.data.frame(test4)

        test5 <- test4[!duplicated(test4),]

        return(test5)
        }

Now I want to separate column V5, but I'm stuck here. I think I can allmost
use the exact same code as before, but I can't figure it out. 
Any help please??

Thank you in advance!



--
View this message in context: 
http://r.789695.n4.nabble.com/data-manipulation-tp4288663p4288663.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to