[racket-users] Re: Places performance & channel capacity

Brian Adkins Thu, 21 Jan 2016 11:13:23 -0800

On Wednesday, January 20, 2016 at 11:47:57 PM UTC-5, Brian Adkins wrote:
> On Wednesday, January 20, 2016 at 11:28:59 PM UTC-5, Brian Adkins wrote:
> > My initial experiment with places is a bit disappointing:
> > 
> > Sequential version: cpu time: 2084 real time: 2091 gc time: 91
> > 
> > Places version: cpu time: 16895 real time: 3988 gc time: 4244
> > 
> > Using 8x the CPU time seems quite high.
> > 
> > And more importantly, the places version only wrote 128,541 lines to the 
> > output file vs. the correct number of 198,480 (65%). I suspect the program 
> > is ending while some of the worker places are still active, but I think 
> > that implies that if I correctly waited for all activity to finish, the 
> > numbers would be even worse. Putting a sleep after the main input reading 
> > loop gets me all but 5 of the records.
> > 
> > My simplistic design was to have the main place read lines from an input 
> > file, write the lines to the place channels of N workers for processing (in 
> > a round robin manner using modulo # workers). The workers write the parsed 
> > lines to a single output place for writing to the output file.
> > 
> > Is it possible to limit the number of messages on a place channel? I 
> > suspect the input place is moving faster than the workers, so the workers' 
> > place channels may be getting huge.
> > 
> > Someone mentioned buffered asynchronous channels on IRC since you can set a 
> > limit, but it appears you can't send them across a place channel, so I'm 
> > unsure how to make use of them with places.
> > 
> > Sequential & parallel code is here:
> > 
> > https://gist.github.com/lojic/283aa3eec777e4810efc
> > 
> > Relevant lines are lines 44 to 105 of the parallel version.
> > 
> > Are there any projects, papers, etc. of best practices with respect to 
> > places that I can look at?
> > 
> > Thanks,
> > Brian
> 
> FYI - after some trial and error, I was able to specify a sleep time of ~ 
> 500ms after the input file has been read to allow sufficient time for the 
> workers to write to the output place and the output place to populate the 
> file (minus ~ 5 records). The output from time for various number of workers 
> is:
> 
> 2 workers => cpu time: 9396 real time: 3279 gc time: 2233
> 
> 3 workers => cpu time: 11835 real time: 3543 gc time: 2137
> 
> 4 workers => cpu time: 17078 real time: 4461 gc time: 4234
> 
> 5 workers => cpu time: 21610 real time: 4988 gc time: 5366
> 
> I have 4 cores and 8 hyperthreads, but adding workers just means more CPU 
> time and more elapsed time :(


I changed the code to have the main place write #f to each of the worker 
places' channels after it sent the last line, and then wait to get #f back from 
each worker. This is only part of the solution, but it was helpful because it 
showed that the output place still had over 52,000 lines in its place channel 
waiting to be output to disk.

I have to think that's bad for performance.

I'm sure I can implement my own workaround for this, but am I wrong in thinking 
that being able to set a limit/capacity for a place's channel is a good idea? 
It seems the most natural solution to have the producer place block if the 
channel is full.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Re: Places performance & channel capacity

Reply via email to