On Wed, Oct 18, 2017 at 10:21 AM, Jason <jasonh...@gmail.com> wrote:
> On Wednesday, October 18, 2017 at 12:14:30 PM UTC-4, Ian wrote:
>> On Wed, Oct 18, 2017 at 9:46 AM, Jason  wrote:
>> > #When I change line19 to True to use the multiprocessing stuff it all 
>> > slows down.
>> >
>> > from multiprocessing import Process, Manager, Pool, cpu_count
>> > from timeit import default_timer as timer
>> >
>> > def f(a,b):
>> >         return dict_words[a]-b
>>
>> Since the computation is so simple my suspicion is that the run time
>> is dominated by IPC, in other words the cost of sending objects back
>> and forth outweighs the gains you get from parallelization.
>>
>> What happens if you remove dict_words from the Manager and just pass
>> dict_words[a] across instead of just a? Also, I'm not sure why
>> dict_keys is a managed list to begin with since it only appears to be
>> handled by the main process.
>
> You can try that code by changing line 17 :-) It's 10 times faster.
>
> Well given the many objects to be iterated over, I was hoping pool.map() 
> would distribute them across the processors. So that each processor gets 
> len(dict_words)/cpu_count() items to process. The actual computation is much 
> longer than a single subtraction, currently I can process about 16k items per 
> second single core.  My target is to get those 16k items processed in .25s.
Well, in any case I would try keeping the communication to a minimum.
Instead of sending 1 item at a time, break the data set up into
batches of 4k and send each child task a batch. Avoid using managers
or other shared state.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to