Is the slowdown happening while mclapply runs or while you're doing the rbind? If the latter, I wonder if the code below is more efficient than using rbind inside a loop:
my_df = do.call( rbind , my_list_from_mclapply ) On Wed, Jun 29, 2011 at 3:34 PM, Vincent Aubanel <[email protected]> wrote: > Hi all, > > I'm using mclapply() of the multicore package for processing chunks of data > in parallel --and it works great. > > But when I want to collect all processed elements of the returned list into > one big data frame it takes ages. > > The elements are all data frames having identical column names, and I'm using > a simple rbind() inside a loop to do that. But I guess it makes some > expensive checking computations at each iteration as it gets slower and > slower as it goes. Writing out to disk individual files, concatenating with > the system and reading back from disk the resulting file is actually faster... > > Is there a magic argument to rbind() that I'm missing, or is there any other > solution to collect the results of parallel processing efficiently? > > Thanks, > Vincent > > _______________________________________________ > R-SIG-Mac mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/r-sig-mac > _______________________________________________ R-SIG-Mac mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-mac
