Ken Jin <kenjin4...@gmail.com> added the comment:
Hello, it would be great if you can you provide more details. Like your 
Operating System and version, how many logical CPU cores there are on your 
machine, and lastly the exact Python version with major and minor versions 
included (eg. Python 3.8.2). Multiprocessing behaves differently depending on 
those factors.

FWIW I reduced your code down to make it easier to read, and removed all the 
unused variables:

import concurrent.futures
from sklearn.datasets import make_regression

def just_print():
    print('Just printing')

def fit_model():
    data = make_regression(n_samples=500, n_features=100, n_informative=10, 
n_targets=1, random_state=5)
    print('Fit complete')

if __name__ == '__main__':
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results_temp = [executor.submit(just_print) for i in range(0,12)]

    with concurrent.futures.ProcessPoolExecutor() as executor:
        results_temp = [executor.submit(fit_model) for i in range(0,12)]

The problem is that I am *unable* to reproduce the bug you are reporting on 
Windows 10 64-bit, Python 3.7.6. The code runs till completion for both 
examples. I have a hunch that your problem lies elsewhere in one of the many 
libraries you imported.

>>> Note: problem occurs only after performing the RandomizedSearchCV...

Like you have noted, I went to skim through RandomizedSearchCV's source code 
and docs. RandomizedSearchCV is purportedly able to use multiprocessing backend 
for parallel tasks. By setting `n_jobs=-1` in your params, you're telling it to 
use all logical CPU cores. I'm unsure of how many additional processes and 
pools RandomizedSearchCV's spawns after calling it, but this sounds suspicious. 
concurrent.futures specifically warns that this may exhaust available workers 
and cause tasks to never complete. See 
https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor 
(the docs here are for ThreadPoolExecutor, but they still apply).

A temporary workaround might be to reduce n_jobs OR even better: use 
scikit-learn's multiprocessing parallel backend that's dedicated for that, and 
should have the necessary protections in place against such behavior. 
https://joblib.readthedocs.io/en/latest/parallel.html#joblib.parallel_backend 


TLDR: I don't think this is a Python bug and I'm in favor of this bug being 
closed as `not a bug`.

----------
nosy: +kj

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42245>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to