On May 19, 2010, at 5:01 PM, Wu Gong wrote:


It took me a day to make the sense of Jim's code :(

Hope my comments will help.

## Transform data to matrix
x <- as.matrix(x)

## Apply function to each row
## Create a function to rearrange bases
result <- apply(x, 1, function(eachrow){

## Split each gene to bases
## Exclude the fist column which is id
        bases <- strsplit(eachrow[-1], '')
        
## Transform list to matrix
## Because the result of function strsplit is a list
        bases <- do.call(rbind,bases)
        
## Recombine bases by connecting all bases in each column
        recombine <- apply(bases, 2, paste, collapse="")
        
## Add id
## Transpos recombine
        cbind(eachrow[1], t(recombine))
})

## Transpose the result matrix  
result <- t(result)

It will come more quickly as you learn more. I also looked at Jimm's solution by pulling it apart, although I did not spend a whole day at it, maybe ten minutes. I thought a three line version was more informative, because it did not make everything scroll of the console:

> x <- read.table(textConnection("SampleID A1 A2 A3 A4
+  GM920222        GATTGCC GATTGCC GATAGAC GATAGAC
+  GM930040        GTCATCA GAGTGCA ACTATAA GATTGCC
+ GM930040 GTCATCA GAGTGCA ACTATAA GATTGCC"), header=TRUE, as.is=TRUE)
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(rbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- t(apply(z, 2, paste, collapse=''))
+      # add the ID
+      cbind(.row[1], z.col)
+  }))
     [,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

# I usually see if I can get the inner-most function to work:

> z <- do.call(rbind, strsplit(x[1,], ''))
Warning message:
In function (..., deparse.level = 1)  :
number of columns of result is not a multiple of vector length (arg 2)
> z
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
SampleID "G"  "M"  "9"  "2"  "0"  "2"  "2"  "2"

#So I guess I didn't get an exact replica since Jim had excluded the first element in the row

A1       "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A2       "G"  "A"  "T"  "T"  "G"  "C"  "C"  "G"
A3       "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
A4       "G"  "A"  "T"  "A"  "G"  "A"  "C"  "G"
> z <- do.call(rbind, strsplit(x[1,-1], ''))  # there ... cleaner
> z
   [,1] [,2] [,3] [,4] [,5] [,6] [,7]
A1 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A2 "G"  "A"  "T"  "T"  "G"  "C"  "C"
A3 "G"  "A"  "T"  "A"  "G"  "A"  "C"
A4 "G"  "A"  "T"  "A"  "G"  "A"  "C"

That seemed to help understand what was going on in the middle of the functions. Now I wondered if the transpose could be avoided. So I tried cbind instead of rbind:

> z <- do.call(cbind, strsplit(x[1,-1], ''))
> z
     A1  A2  A3  A4
[1,] "G" "G" "G" "G"
[2,] "A" "A" "A" "A"
[3,] "T" "T" "T" "T"
[4,] "T" "T" "A" "A"
[5,] "G" "G" "G" "G"
[6,] "C" "C" "A" "A"
[7,] "C" "C" "C" "C"
> z.col <- apply(z, 2, paste, collapse='')
> z.col
       A1        A2        A3        A4
"GATTGCC" "GATTGCC" "GATAGAC" "GATAGAC"

## Nope that does not work:
## So try apply on the columns ...
> z.col <- apply(z, 1, paste, collapse='')
> z.col
[1] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"

## OK that worked. Now see if it works inside the whole sequence:

> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(cbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- apply(z, 1, paste, collapse='')
+      # add the ID
+      cbind(.row[1], z.col)
+  }))
[,1] [,2] [,3] [,4] [,5] [, 6] [,7] [1,] "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" "GM920222" [2,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" [3,] "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040" "GM930040"

Well not exactly.
     [,8]   [,9]   [,10]  [,11]  [,12]  [,13]  [,14]
[1,] "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
> x <- as.matrix(x)
> t(apply(x, 1, function(.row){
+      # separate characters
+      z <- do.call(cbind, strsplit(.row[-1], ''))
+      # combine each column
+      z.col <- apply(z, 1, paste, collapse='')
+      # add the ID
## and add the transpose columns:
+      cbind(.row[1], t(z.col))
+  }))
     [,1]       [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
[1,] "GM920222" "GGGG" "AAAA" "TTTT" "TTAA" "GGGG" "CCAA" "CCCC"
[2,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"
[3,] "GM930040" "GGAG" "TACA" "CGTT" "ATAT" "TGTG" "CCAC" "AAAC"

So I got to the same place but didn't really achieve any savings.


-----
A R learner.


David "also still learning" Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to