Re: [R] How to transpose it in a fast way?

peter dalgaard Fri, 08 Mar 2013 02:09:58 -0800

On Mar 7, 2013, at 01:18 , Yao He wrote:

> Dear all:
> 
> I have a big data file of 60000 columns and 60000 rows like that:
> 
> AA AC AA AA .......AT
> CC CC CT CT.......TC
> ..........................
> .........................
> 
> I want to transpose it and the output is a new like that
> AA CC ............
> AC CC............
> AA CT.............
> AA CT.........
> ....................
> ....................
> AT TC.............
> 
> The keypoint is  I can't read it into R by read.table() because the
> data is too large,so I try that:
> c<-file("silygenotype.txt","r")
> geno_t<-list()
> repeat{
>  line<-readLines(c,n=1)
>  if (length(line)==0)break  #end of file
>  line<-unlist(strsplit(line,"\t"))
> geno_t<-cbind(geno_t,line)
> }
> write.table(geno_t,"xxx.txt")
> 
> It works but it is too slow ,how to optimize it???



As others have pointed out, that's a lot of data! 

You seem to have the right idea: If you read the columns line by line there is 
nothing to transpose. A couple of points, though:

- The cbind() is a potential performance hit since it copies the list every 
time around. geno_t <- vector("list", 60000) and then 
geno_t[[i]] <- <etc>

- You might use scan() instead of readLines, strsplit

- Perhaps consider the data type as you seem to be reading strings with 16 
possible values (I suspect that R already optimizes string storage to make this 
point moot, though.) 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to transpose it in a fast way?

Reply via email to