Is it possible to parallel scan a large file into a character vector in 1M chunks using scan() with the "doMC" package? Furthermore, can I specify the tasks for each child?
i.e. I'm working on a Linux box with 8 cores and would like to scan in 8M records at time (all 8 cores scan 1M records at a time) from a file with 40M records total. file <- file("data.txt","r") child <- foreach(i = icount(40)) %dopar% { scan(file,what = "character",sep = "\n",skip = 0,nlines = 1e6) } Thus, each child would have a different skip argument. child[[1]]: skip = 0, child[[2]]: skip = 1e6 + 1, child[[3]]: skip = 2e6 + 1, ... ,child[[40]]: skip = 39e6 + 1. I would then end up with a list of 40 vectors with child[[1]] containing records 1 to 1000000, child[[2]] containing records 1000001 to 2000000, ... ,child[[40]] containing records 39000001 to 40000000. Also, would one file connection suffice or does their need to be a file connection that opens and closes for each child? -- View this message in context: http://r.789695.n4.nabble.com/Parallel-Scan-of-Large-File-tp3077545p3077545.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.