Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] On Behalf Of Gabor Grothendieck
> Sent: Monday, October 18, 2010 7:03 AM
> To: Bond, Stephen
> Cc: r-help@r-project.org
> Subject: Re: [R] aggregate with cumsum
> 
> On Mon, Oct 18, 2010 at 9:55 AM, Bond, Stephen 
> <stephen.b...@cibc.com> wrote:
> > Gabor,
> >
> > You are suggesting some very advanced usage that I do not 
> understand, but it seems this is not what I meant when I said loop.
> > I have a df with 47k rows and each of these is fed to a 
> 'predict' which will output about 62 rows, so the number of 
> groups is very large and I implied that I would go through 
> the 47k x 62 rows with
> >
> > For (jj in (set of 47k values)) # 
> tmp.df=big.df[big.df$group==jj,] to subset
> >                                # and then sum
> >
> > Which is very slow. I discovered that even creating the 
> dataset is super slow as I use write.table
> >
> > The clogging comes from
> >
> > 
> write.table(tmp,"predcom.csv",row.names=FALSE,col.names=FALSE,
> append=TRUE,sep=',')
> >
> > Can anybody suggest a faster way of appending to a text file??

Writing the output to a file instead of inserting
it into an R object almost never gives you more speed.  Writing
to a text file and later reading from it with read.table or
the like can lose a lot of precision.  Use one of the
R functions Gabor and others have suggested.

If you really want to append many times to one file things will
go much faster if you open the file before all the writing
and close it when you are done, instead of opening and
closing it implicitly for each write.  E.g., on my Windows XP
laptop opening the file once gives a c. 320:1 speedup:

  > tfile1 <- tempfile()
  > system.time(for(i in 1:1e4)cat(i, file=tfile1, append=TRUE))
     user  system elapsed 
     1.84    4.30   79.86 

  > tfile2 <- tempfile()
  > ofile <- file(tfile2, open="a") # open in append mode
  > system.time(for(i in 1:1e4)cat(i, file=ofile))
     user  system elapsed 
     0.18    0.07    0.25 
  > close(ofile)

and there is not difference in what the output files contain.

  > identical(readLines(tfile1), readLines(tfile2))
  [1] TRUE
  Warning messages:
  1: In readLines(tfile1) :
    incomplete final line found on 
'C:\DOCUME~1\wdunlap\LOCALS~1\Temp\Rtmpdy7MQ0\file41bb5af1'
  2: In readLines(tfile2) :
    incomplete final line found on 
'C:\DOCUME~1\wdunlap\LOCALS~1\Temp\Rtmpdy7MQ0\file1eb26e9'

write.table() has a lot of additional overhead beyond
opening and closing files.  Using cat() is the fastest.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> >
> > All comments are appreciated.
> 
> If the problem is to sum each row of a matrix then rowSums can do that
> without a loop.
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to