On Tue, Feb 21, 2012 at 4:04 PM, Matthew Keller <mckellerc...@gmail.com> wrote:
> X <- read.big.matrix("file.loc.X",sep=" ",type="double") > hap.indices <- bigsplit(X,1:2) #this runs for too long to be useful on > these matrices > #I was then going to use foreach loop to sum across the splits > identified by bigsplit How about just using foreach earlier in the process ? e.g. split file.loc.X to (80) sub files and then run read.big.matrix/bigsplit/sum inside %dopar% If splitting X beforehand is a problem, you could also use ?scan to read in different chunks of the file, something like (untested obviously): # for X a matrix 800x4 lineind<- seq(1,800,100) # create an index vec for the lines to read ReducedX<- foreach(i = 1:8) %dopar%{ x <- scan('file.loc.X',list(double(0),double(0),double(0),double(0)),skip=lineind[i],nlines=100) ... do your thing on x (aggregate/tapply etc.) } Hope this helped Elai. > > SO - does anyone have ideas on how to deal with this problem - i.e., > how to use a tapply() like function on an enormous matrix? This isn't > necessarily a bigtabulate question (although if I screwed up using > bigsplit, let me know). If another package (e.g., an SQL package) can > do something like this efficiently, I'd like to hear about it and your > experiences using it. > > Thank you in advance, > > Matt > > > > -- > Matthew C Keller > Asst. Professor of Psychology > University of Colorado at Boulder > www.matthewckeller.com > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.