Hi Dario -- it was this commit

------------------------------------------------------------------------
r111519 | [email protected] | 2015-12-15 14:34:18 -0500 (Tue, 15 Dec 2015) | 2 
lines

port: r111463, bugfix: workers=1, tasks=0 assigns all X to one chunk

------------------------------------------------------------------------

in response to this report

https://support.bioconductor.org/p/75945/

Previously, the behavior when the number of 'tasks' was unspecified (default 
value 0) was to split X (in your example, the vector 1:100) into 100 individual 
tasks 1, 2, 3, ..., and to process each in a completely independent parallel 
process -- there would be a total of 100 processes started and stopped. The 
change mentioned above instead behaves as documented, splitting the 100 
elements approximately evenly between the specified number of workers (25), and 
sending several elements to each worker for processing. This saves the cost of 
communicating the object to and from the worker. You can get the old behavior 
by specifying tasks = length(X), for your example tasks=100. 

The 'split' of elements into tasks can be seen by calling the internal function 
.splitX()

> head(BiocParallel:::.splitX(1:100, 25, 100))  # 1 task per job
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

> head(BiocParallel:::.splitX(1:100, 25, 0))  # 4 tasks per job
[[1]]
[1] 1 2 3 4

[[2]]
[1] 5 6 7 8

[[3]]
[1]  9 10 11 12

[[4]]
[1] 13 14 15 16

[[5]]
[1] 17 18 19 20

[[6]]
[1] 21 22 23 24


Each element of the call to splitX is assigned in order, but the precise 
schedule is somewhat indeterminate -- task 1 might be assigned before task 2, 
but perhaps the process handling task 1 runs the garbage collector before 
sleeping so task 2 finishes ahead of task 1. Under the original scheme I guess 
you were relying on the average execution time of ten processes between each 
message, whereas in the correct scheme you are relying on the average execution 
time of just three processes so greater variability. Either way, though, the 
order of execution is not guaranteed.

Messages are reported at the end of each task; there are 100 opportunities for 
messages when the number of tasks is 100, but only 25 opportunities 
(corresponding approximately to each processor handling 4 elements) otherwise.

Other than being different from previously, is there an underlying problem?

Martin
________________________________________
From: Bioc-devel [[email protected]] on behalf of Dario Strbenac 
[[email protected]]
Sent: Sunday, December 27, 2015 7:00 PM
To: bioc-devel list
Subject: [Bioc-devel] Progress Message Order in bplapply

Hello,

I am experiencing some new and unexpected behaviour of mclapply.

Previously, progress messages were displayed in almost the expected order. Now, 
they are unlike the original order. My test case is :

bplapply(1:100, function(x) {if(x %% 10 == 0) message(x); Sys.sleep(30)}, 
BPPARAM = MulticoreParam(workers = 25))

The resulting progress message aren't displayed until the end of the process, 
whereas before they appeared immediately. I would expect 10 and 20 to appear 
before 30 did.

30
40
50
10
20
60
70
80
90
100

--------------------------------------
Dario Strbenac
PhD Student
University of Sydney
Camperdown NSW 2050
Australia
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to