You could use the fact that scan reads the data rowwise, and the fact
that arrays are stored columnwise:
# generate a small example dataset
exampl <- array(letters[1:25], dim=c(5,5))
write.table(exampl, file="example.dat", row.names=FALSE. col.names=FALSE,
sep="\t", quote=FALSE)
# and read...
d <- scan("example.dat", what=character())
d <- array(d, dim=c(5,5))
t(exampl) == d
Although this is probably faster, it doesn't help with the large size.
You could used the n option of scan to read chunks/blocks and feed
those to, for example, an ff array (which you ideally have
preallocated).
HTH,
Jan
peter dalgaard <pda...@gmail.com> schreef:
On Mar 7, 2013, at 01:18 , Yao He wrote:
Dear all:
I have a big data file of 60000 columns and 60000 rows like that:
AA AC AA AA .......AT
CC CC CT CT.......TC
..........................
.........................
I want to transpose it and the output is a new like that
AA CC ............
AC CC............
AA CT.............
AA CT.........
....................
....................
AT TC.............
The keypoint is I can't read it into R by read.table() because the
data is too large,so I try that:
c<-file("silygenotype.txt","r")
geno_t<-list()
repeat{
line<-readLines(c,n=1)
if (length(line)==0)break #end of file
line<-unlist(strsplit(line,"\t"))
geno_t<-cbind(geno_t,line)
}
write.table(geno_t,"xxx.txt")
It works but it is too slow ,how to optimize it???
As others have pointed out, that's a lot of data!
You seem to have the right idea: If you read the columns line by
line there is nothing to transpose. A couple of points, though:
- The cbind() is a potential performance hit since it copies the
list every time around. geno_t <- vector("list", 60000) and then
geno_t[[i]] <- <etc>
- You might use scan() instead of readLines, strsplit
- Perhaps consider the data type as you seem to be reading strings
with 16 possible values (I suspect that R already optimizes string
storage to make this point moot, though.)
--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk Priv: pda...@gmail.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.