Has there been any systematic evaluation of which core R functions are safe for use with multicore? Of current interest, I have tried calling read.table() via mclapply() to more quickly read in hundreds of raw data files (I have a 24 core system with 72 GB running Ubuntu, a perfect platform for multicore). There was a 40% failure rate, which doesn't occur when I invoke read.table() serially from within a single thread. Another example was using pvec() to invoke sapply(strsplit(),...) on a huge character vector (to pull out fields from within a field). It looked like a perfect application for pvec(), but it fails when serial execution works.
I thought I'd ask before taking on the task of digging into the underlying code to see what is might be causing failure in a multicore (well, multi-threaded) context. As an alternative, I could define multiple cluster nodes locally, but that shifts the tradeoff a bit in whether parallel execution is advantageous - the overhead is significantly more, and even with 72 GB, it does impose greater limits on how many cores can be used. Bill Hopkins ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.