Thanks for the responses. Peter wrote:
> Which OS is this? MacOS Ventura 13.1, M1 MacBook Pro (eight cores). Thomas wrote: > I'm no expert on locks, but you don't usually want to keep a lock while > some long-running computation goes on. You want the computation to be > done by a separate thread, put its results somewhere, and then notify > the choreographing thread that the result is ready. In this case I'm extracting the noun phrases from the body of an email message (returned as a list). I have a collection of email messages organized by month (typically 1000 to 3000 messages per month). I'm using concurrent.futures.ThreadPoolExecutor() with the default number of workers ( os.cpu_count() * 1.5, or 12 threads on my system) to process each month, so 12 active threads at a time. Given that the process is pretty much CPU bound, maybe reducing the number of workers to the CPU count would make sense. Processing of each email message enters that with block once. That's about as minimal as I can make it. I thought for a bit about pushing the textblob stuff into a separate worker thread, but it wasn't obvious how to set up queues to handle the communication between the threads created by ThreadPoolExecutor() and the worker thread. Maybe I'll think about it harder. (I have a related problem with SQLite, since an open database can't be manipulated from multiple threads. That makes much of the program's end-of-run processing single-threaded.) > This link may be helpful - > > https://anandology.com/blog/using-iterators-and-generators/ I don't think that's where my problem is. The lock protects the generation of the noun phrases. My loop which does the yielding operates outside of that lock's control. The version of the code is my latest, in which I tossed out a bunch of phrase-processing code (effectively dead end ideas for processing the phrases). Replacing the for loop with a simple return seems not to have any effect. In any case, the caller which uses the phrases does a fair amount of extra work with the phrases, populating a SQLite database, so I don't think the amount of time it takes to process a single email message is dominated by the phrase generation. Here's timeit output for the noun_phrases code: % python -m timeit -s 'text = """`python -m timeit --help`""" ; from textblob import TextBlob ; from textblob.np_extractors import ConllExtractor ; ext = ConllExtractor() ; phrases = TextBlob(text, np_extractor=ext).noun_phrases' 'phrases = TextBlob(text, np_extractor=ext).noun_phrases' 5000 loops, best of 5: 98.7 usec per loop I process the output of timeit's help message which looks to be about the same length as a typical email message, certainly the same order of magnitude. Also, note that I call it once in the setup to eliminate the initial training of the ConllExtractor instance. I don't know if ~100us qualifies as long running or not. I'll keep messing with it. Skip -- https://mail.python.org/mailman/listinfo/python-list