Hi Matthew, Thanks a lot for the reply!
This works! Thanks a lot. Regards, Alexey On 12 Nov 2014, at 18:52, Matthew Flatt <mfl...@cs.utah.edu> wrote: > I'll push a repair to the development version. > > > The problem isn't so much that message copying/transfer is slow, but > that the rule to trigger an all-places GC doesn't accommodate a large, > not-yet-delivered message. I'll repair that rule. > > Most of the process time in your example shows up as GC time, because > the GC was continuously firing while the message waited for the new > place to start and receive it (and the constant GCs slowed the place > start-up). > > > If upgrading is not an option, you can work around the problem by > waiting for a "ready" message from the new place before sending the > vector as a message. For example, change `test-place1` to > > (define (test-places1) > (define p1 > (place ch1 > (place-channel-put ch1 'ready) > (define v (place-channel-get ch1)) > (define w (long-computation v)) > (place-channel-put ch1 w))) > (place-channel-get p1) ; => 'ready > (place-channel-put p1 v1) > (time (place-channel-get p1))) > > That way, `v1` doesn't sit in the message channel long enough to cause > a problem. > > At Tue, 11 Nov 2014 17:41:11 -0700, Matthew Flatt wrote: >> This does seem extremely slow. A place-message send must copy the >> vector to send it as a message, but the copy shouldn't take so long. >> I'll investigate further. >> >> Meanwhile, an option in this case might be to created a "shared >> flvector", which can be passed directly (i.e., without copying) to >> another place. I've enclosed a variant of your example to illustrate. >> >> At Mon, 10 Nov 2014 11:58:21 +0200, Alexey Cherkaev wrote: >>> Hi, >>> >>> I am looking at parallelising some numerical computation with Racket. I’ve >>> tried future/touch first. However, the data for computation is passed as >>> vectors and in my experiments with future/touch it would always find >>> "synchronisation task” upon which all multicore-threads collapse into one >> core >>> serialised computation. >>> >>> Now, I decided to try place. My idea is to make it similar to Common Lisp’s >>> LPARALLEL: create workers <= number of cores and distribute tasks into >>> those >>> workers. The problem I have encountered, however, is that place-channel-get >>> seems to take forever to compute. Here is an example of some simulated >>> computation on a vector using two places and trying to run them in parallel: >>> >>> #lang racket >>> >>> (require racket/place) >>> >>> (provide test-places1 test-places2 long-computation v1 v2 random-vector) >>> >>> ;;; Utilities: >>> (define (random-list n) >>> (let loop ((i n) (r '())) >>> (if (zero? i) >>> r >>> (loop (sub1 i) (cons (random) r))))) >>> >>> (define (random-vector n) >>> (let ((l (random-list n))) >>> (list->vector l))) >>> >>> (define (vector-reduce f init v) >>> (let ((n (vector-length v))) >>> (let loop ((i 0) (r init)) >>> (if (= i n) >>> r >>> (loop (add1 i) (f r (vector-ref v i))))))) >>> >>> ;;; This is computation to be run in each place: >>> (define (long-computation v) >>> (let ((n (vector-length v)) >>> (v1 (vector-copy v))) ; v is immutable, if want to mutate, must >>> copy >> it >>> (let loop ((i 0)) >>> (if (= i n) >>> (begin >>> (sleep 2) ; make it work for a bit longer >>> (vector-reduce + 0.0 v1)) ; to make result printable >>> (begin >>> (vector-set! v1 i (* (exp (- (vector-ref v1 i))) >>> (sin (* pi (vector-ref v1 i))))) ;flonum >>> computation >>> (loop (add1 i))))))) >>> >>> ;;; two vectors to be sent to long-computation >>> (define v1 (random-vector 100000)) >>> (define v2 (random-vector 100000)) >>> >>> ;;; Test using one place: >>> (define (test-places1) >>> (define p1 >>> (place ch1 >>> (define v (place-channel-get ch1)) >>> (define w (long-computation v)) >>> (place-channel-put ch1 w))) >>> (place-channel-put p1 v1) >>> (time (place-channel-get p1))) >>> >>> ;;; Test using 2 places: >>> (define (test-places2) >>> (define p1 >>> (place ch1 >>> (define v (place-channel-get ch1)) >>> (define w (long-computation v)) >>> (place-channel-put ch1 w))) >>> (define p2 >>> (place ch2 >>> (define v (place-channel-get ch2)) >>> (define w (long-computation v)) >>> (place-channel-put ch2 w))) >>> (place-channel-put p1 v1) >>> (place-channel-put p2 v2) >>> (sleep 2) ; hypothetically, after this results shoud be ready immidiately! >>> (time (list (place-channel-get p1) (place-channel-get p2)))) >>> >>> Exectution from racket on MacBook Pro with Intel Core 2 Duo: >>> >>> -> (time (long-computation v1)) >>> cpu time: 42 real time: 2043 gc time: 0 >>> 39523.12275516648 >>> -> (test-places1) >>> cpu time: 7593 real time: 7475 gc time: 7001 >>> 39523.12275516648 >>> -> (test-places2) >>> cpu time: 16591 real time: 12492 gc time: 15485 >>> '(39523.12275516648 39505.415738171105) >>> >>> So, the time of execution of (long-computation v1) and the time of getting >> the >>> result out of the channel in (test-places1) should be more or less the >>> same, >>> but it is not. Furthermore, (test-places2) takes almost twice as >> (test-places1) >>> (note, I put (time …) around just getting the value, so it does not include >> the >>> time of creating the place). >>> >>> Am I doing something wrong? >>> >>> Cheers, Alexey >>> >>> >>> ____________________ >>> Racket Users list: >>> http://lists.racket-lang.org/users >> ------------------------------------------------------------------------------ >> [application/octet-stream "shared-flvector-example.rkt"] [~/Desktop & open] >> [~/Temp & open] >> ____________________ >> Racket Users list: >> http://lists.racket-lang.org/users
____________________ Racket Users list: http://lists.racket-lang.org/users