On Mar 7, 2013, at 01:18 , Yao He wrote: > Dear all: > > I have a big data file of 60000 columns and 60000 rows like that: > > AA AC AA AA .......AT > CC CC CT CT.......TC > .......................... > ......................... > > I want to transpose it and the output is a new like that > AA CC ............ > AC CC............ > AA CT............. > AA CT......... > .................... > .................... > AT TC............. > > The keypoint is I can't read it into R by read.table() because the > data is too large,so I try that: > c<-file("silygenotype.txt","r") > geno_t<-list() > repeat{ > line<-readLines(c,n=1) > if (length(line)==0)break #end of file > line<-unlist(strsplit(line,"\t")) > geno_t<-cbind(geno_t,line) > } > write.table(geno_t,"xxx.txt") > > It works but it is too slow ,how to optimize it???
As others have pointed out, that's a lot of data! You seem to have the right idea: If you read the columns line by line there is nothing to transpose. A couple of points, though: - The cbind() is a potential performance hit since it copies the list every time around. geno_t <- vector("list", 60000) and then geno_t[[i]] <- <etc> - You might use scan() instead of readLines, strsplit - Perhaps consider the data type as you seem to be reading strings with 16 possible values (I suspect that R already optimizes string storage to make this point moot, though.) -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.