On 5/8/2014 2:55 PM, Andrew McLean wrote:
I have a problem that would benefit from a multithreaded implementation
and having trouble understanding how to approach it using
concurrent.futures.
The details don't really matter, but it will probably help to be
explicit. I have a large CSV file that contains a lot of fields, amongst
them one containing email addresses. I want to write a program that
validates the email addresses by checking that the domain names have a
valid MX record. The output will be a copy of the file with any invalid
email addresses removed. Because of latency in the DNS lookup this could
benefit from multithreading.
I have written similar code in the past using explicit threads
communicating via queues. For this example, I could have a thread that
read the file using csv.DictReader, putting dicts containing records
from the input file into a (finite length) queue. Then I would have a
number of worker threads reading the queue, performing the validation
and putting validated results in a second queue. A final thread would
read from the second queue writing the results to the output file.
So far so good. However, I thought this would be an opportunity to
explore concurrent.futures and to see whether it offered any benefits
over the more explicit approach discussed above. The problem I am having
is that all the discussions I can find of the use of concurrent.futures
show use with toy problems involving just a few tasks.
You might look as the new asyncio module in 3.4 (backport available on
pypi, I believe). Among other things, it uses a variation on
concurrent.futures. It includes timeouts.
--
Terry Jan Reedy
--
https://mail.python.org/mailman/listinfo/python-list