This does seem extremely slow. A place-message send must copy the vector to send it as a message, but the copy shouldn't take so long. I'll investigate further.
Meanwhile, an option in this case might be to created a "shared flvector", which can be passed directly (i.e., without copying) to another place. I've enclosed a variant of your example to illustrate. At Mon, 10 Nov 2014 11:58:21 +0200, Alexey Cherkaev wrote: > Hi, > > I am looking at parallelising some numerical computation with Racket. I’ve > tried future/touch first. However, the data for computation is passed as > vectors and in my experiments with future/touch it would always find > "synchronisation task” upon which all multicore-threads collapse into one > core > serialised computation. > > Now, I decided to try place. My idea is to make it similar to Common Lisp’s > LPARALLEL: create workers <= number of cores and distribute tasks into those > workers. The problem I have encountered, however, is that place-channel-get > seems to take forever to compute. Here is an example of some simulated > computation on a vector using two places and trying to run them in parallel: > > #lang racket > > (require racket/place) > > (provide test-places1 test-places2 long-computation v1 v2 random-vector) > > ;;; Utilities: > (define (random-list n) > (let loop ((i n) (r '())) > (if (zero? i) > r > (loop (sub1 i) (cons (random) r))))) > > (define (random-vector n) > (let ((l (random-list n))) > (list->vector l))) > > (define (vector-reduce f init v) > (let ((n (vector-length v))) > (let loop ((i 0) (r init)) > (if (= i n) > r > (loop (add1 i) (f r (vector-ref v i))))))) > > ;;; This is computation to be run in each place: > (define (long-computation v) > (let ((n (vector-length v)) > (v1 (vector-copy v))) ; v is immutable, if want to mutate, must copy > it > (let loop ((i 0)) > (if (= i n) > (begin > (sleep 2) ; make it work for a bit longer > (vector-reduce + 0.0 v1)) ; to make result printable > (begin > (vector-set! v1 i (* (exp (- (vector-ref v1 i))) > (sin (* pi (vector-ref v1 i))))) ;flonum > computation > (loop (add1 i))))))) > > ;;; two vectors to be sent to long-computation > (define v1 (random-vector 100000)) > (define v2 (random-vector 100000)) > > ;;; Test using one place: > (define (test-places1) > (define p1 > (place ch1 > (define v (place-channel-get ch1)) > (define w (long-computation v)) > (place-channel-put ch1 w))) > (place-channel-put p1 v1) > (time (place-channel-get p1))) > > ;;; Test using 2 places: > (define (test-places2) > (define p1 > (place ch1 > (define v (place-channel-get ch1)) > (define w (long-computation v)) > (place-channel-put ch1 w))) > (define p2 > (place ch2 > (define v (place-channel-get ch2)) > (define w (long-computation v)) > (place-channel-put ch2 w))) > (place-channel-put p1 v1) > (place-channel-put p2 v2) > (sleep 2) ; hypothetically, after this results shoud be ready immidiately! > (time (list (place-channel-get p1) (place-channel-get p2)))) > > Exectution from racket on MacBook Pro with Intel Core 2 Duo: > > -> (time (long-computation v1)) > cpu time: 42 real time: 2043 gc time: 0 > 39523.12275516648 > -> (test-places1) > cpu time: 7593 real time: 7475 gc time: 7001 > 39523.12275516648 > -> (test-places2) > cpu time: 16591 real time: 12492 gc time: 15485 > '(39523.12275516648 39505.415738171105) > > So, the time of execution of (long-computation v1) and the time of getting > the > result out of the channel in (test-places1) should be more or less the same, > but it is not. Furthermore, (test-places2) takes almost twice as > (test-places1) > (note, I put (time …) around just getting the value, so it does not include > the > time of creating the place). > > Am I doing something wrong? > > Cheers, Alexey > > > ____________________ > Racket Users list: > http://lists.racket-lang.org/users
shared-flvector-example.rkt
Description: Binary data
____________________ Racket Users list: http://lists.racket-lang.org/users