On Oct 4, 2011, at 4:43 AM, Thomas Friedrichsmeier wrote:

> Dear R developers,
> 
> with the inclusion of the package "parallel" in the upcoming release of R, 
> users and package developers are likely to make increasing usage of 
> parallelization features. In part, these features rely on forking the R 
> process. As ?mcfork points out, fork()ing in a GUI process is typically a bad 
> idea. In RKWard, we "only" seem to have problems with signals arriving in the 
> wrong threads, and occasional failure to collect the results from child 
> processes. I haven't entirely given up the hope to fix this, eventually, but 
> in 
> consequence, parallelization based on forking is not currently usable inside 
> an RKWard session.
> 
> I am somewhat worried that, as library(parallel) gains acceptance, 
> unsuspecting users will increasingly start to run into forking related 
> problems in RKWard and other environments.

I don't see why this should be anything new - this is already happening since 
both packages that were folded into parallel (snow and multicore) are well 
known and well used.

In multicore we were explicitly warning about this and also working around 
issues where possible (e.g. the Mac GUI, for example). Judging by the 
widespread use of multicore and the absence of problem reports related to GUIs, 
my impression would be that this aspect is not really a problem (more below). 
We get more users confused about the inability to perform side-effects than 
this, for example.

In general, there are two main issues that can be addressed by the GUI:

a) shared file descriptors. This is a problem if the GUI uses FDs for 
communication and they are not closed in the child instance. You don't want 
both the child and the parent to process those FDs. E.g., closeAll() can be 
used to work around that issue and with parallel there could be an easier 
interface for this given that it's in core R.

b) event loop. If the GUI hooks into the event loop then, obviously, this is 
only intended to be run from the master. multicore was already disabling the 
even loop hook for AQUA, but it was hard to provide a more comprehensive 
solution since it needed cooperation of R. In parallel it's much easier, 
because it can modify R to allow the event loop conditionally and thus only in 
the master process.

The whole point of parallel is that it can do more than an external package, so 
I think you're going about it the wrong way - you should be talking to us much 
earlier so whatever your constraints in RKWard can be possibly addressed by the 
infrastructure. Also note that a lot of this should be seamless, a lot of users 
don't care what the infrastructure is, they just want their task to run in 
parallel, they don't care about mcfork() and the like - the choices will be 
made for them, because there is no fork on Windows, for example.


> Therefore, I wish:
> - The warning from ?mcfork about potential complications should also be 
> visible on the documentation pages for the higher level functions 
> mcparallel(), mclapply(), but also makeForkCluster().
> - It would be nice to have a way to tell library(parallel) that forking is a 
> bad idea in the current session, so that
>  - mcfork() could stop with an informative error message, or at least produce 
> a warning; mclapply() could fall back to mc.cores=1 with a warning.
>  - third party packages which wish to use parallelization could check whether 
> it is safe to use forking, or whether another mechanism should be used.
> 
> I am aware that options(mc.cores=1) will effectively disable forking in 
> mclapply(). However, this would make it look like (local) parallelization is 
> not worth while at all, while actually, parallelization with 
> makePSOCKCluster() works just fine. So, I'm looking for a way to selectively 
> disable the use of forking.
> 


Cheers,
Simon

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to