Printing from multiple workers to stdout is problematic anyway, because the 
output from different workers is not intrinsically synchronized -- they will 
appear interleaved with each other.

Generally for forked processes connections (to stdout, files, databases, urls, 
...) need to be opened on the worker rather than inherited from the master; 
this is obviously not possible with stdout, since there is only one stdout.

One might avoid the problem below by opening separate output streams on 
individual threads, rather than inheriting from master; one would print to 
files and collate on the master, but this is more 'logging' than interactive 
progress report.

Since parallel evaluation is presumably being used because a lot of data are 
being processed, detailed information on progress would seem to rapidly 
overwhelm the user. BiocParallel has a progress bar which, when used with 
tasks, can provided fine-grained updates

> n = 10; p = MulticoreParam(2, tasks = n, progressbar=TRUE)
> res = bplapply(runif(n, 1, 2), Sys.sleep, BPPARAM=p)
  |======================================================================| 100%

Maybe this is enough, perhaps used in conjunction with log = TRUE ?

Another alternative is to use separate processes, each with their own stdout, 
rather than shared processes -- SnowParam()  and the snow package, rather than 
MulticoreParam and other (unix-based) fork implementations. Using separate 
processes requires more discipline but actually is not a bad choice; for 
instance it is the only approach available on Windows, where 1/2 our users are.

Martin

On 1/28/19, 10:31 PM, "Bioc-devel on behalf of Yang Liao" 
<bioc-devel-boun...@r-project.org on behalf of l...@wehi.edu.au> wrote:

    Hi,
    
    I'm not sure if some C developers have gone through this problem: it seems 
that Rprintf cannot work safely in a multi-threaded environment. In particular, 
if I call Rprintf() from a then-created thread while the stack size checking is 
enabled (ie the "R_CStackLimit" pointer isn't set to -1), it is very likely to 
end up with some fatal error messages like:
    
    Error: C stack usage  847645293284 is too close to the limit
    > Error: C stack usage  847336061668 is too close to the limit
    > Error: C stack usage  847666277092 is too close to the limit
    > Error: C stack usage  847346551524 is too close to the limit
    > Error: C stack usage  847367531236 is too close to the limit
    > Error: C stack usage  847357041380 is too close to the limit
    > Error: C stack usage  847378021092 is too close to the limit
    > Error: C stack usage  847655787236 is too close to the limit
    
    , and the R session terminates in a segfault.
    After I used all means to confirm that there was no memory leakage and the 
real stack use was minimum, I thought it can only be the Rprintf issue. I then 
disabled all screen outputs from the then-created threads and the error was 
gone. It was also reported on stackoverflow:
    
https://stackoverflow.com/questions/50092949/why-does-rcout-and-rprintf-cause-stack-limit-error-when-multithreading
    I tried using a semaphore to protect all Rprintf calls but it didn't 
prevent the error.
    
    Since my program needs to report some messages from the worker threads 
(created by the main thread), I wonder if there is a solution to safely do so, 
or I have to pipe the messages to the main thread, which in turn calls Rprintf? 
I hope not to change "R_CStackLimit" to disable the stack size checks because 
it generates a "NOTE" in R check.
    
    Cheers,
    Yang
    
    _______________________________________________
    
    The information in this email is confidential and intend...{{dropped:15}}
    
    _______________________________________________
    Bioc-devel@r-project.org mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to