[issue30919] Shared Array Memory Allocation Regression
New submission from Dimitar Tasev: Hello, I have noticed a significant performance regression when allocating a large shared array in Python 3.x versus Python 2.7. The affected module seems to be `multiprocessing`. The function I used for benchmarking: from timeit import timeit timeit('sharedctypes.Array(ctypes.c_float, 500*2048*2048)', 'from multiprocessing import sharedctypes; import ctypes', number=1) And the results from executing it: Python 3.5.2 Out[2]: 182.68500420999771 --- Python 2.7.12 Out[6]: 2.124835968017578 I will try to provide any information you need. Right now I am looking at callgrind/cachegrind without Debug symbols, and can post that, in the meantime I am building Python with Debug and will re-run the callgrind/cachegrind. Allocating the same-size array with numpy doesn't seem to have a difference between Python versions. The numpy command used was `numpy.full((500,2048,2048), 5.0)`. Allocating the same number of list members also doesn't have a difference - `arr = [5.0]*(500*2048*2048)` -- files: shared_array_alloc.py messages: 298285 nosy: dtasev priority: normal severity: normal status: open title: Shared Array Memory Allocation Regression type: performance versions: Python 2.7, Python 3.5, Python 3.6 Added file: http://bugs.python.org/file47009/shared_array_alloc.py ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Dimitar Tasev added the comment: If I understand correctly, there is no way to force the old behaviour in Python 3.5, i.e. to use an anonymous memory mapping in multiprocessing.heap.Arena so that the memory can be shared between the processes instead of writing to a shared file? The data sizes usually are, on average, 20 GB, and writing it out to a file is not desirable. As I understand from Gareth Rees' comment, ftruncate() will speed up initialisation, however any processing afterwards would be IO capped. To shed more light to the processing going on, the data is handled as a 3D array, so each process gets a 2D array to operate on, and no information needs to be shared between processes. If the anonymous memory mapping cannot be forced, then the multiprocessing module with a shared array becomes unusable for me. Are you aware of any way to use the multiprocessing module to run execution in parallel, that somehow doesn't use a shared array? -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Dimitar Tasev added the comment: I have looked into your advice of changing multiprocessing.heap.Arena.__init__, I have removed the code that allocated the file and reverted to the old behaviour. I have done some brief runs and it seems to bring back the old behaviour which is allocating the space in RAM, rather than with IO. I am not sure what things this might break, and it might make the other usages of multiprocessing unstable! Can anyone think of anything this change might break? The Arena.__init__ code is the one from Python 2.7: class Arena(object): def __init__(self, size, fd=-1): self.size = size self.fd = fd # still kept but is not used ! self.buffer = mmap.mmap(-1, self.size) There does not seem to be a difference regardless of the start method setting multiprocessing.set_start_method('fork') to be 'fork'. -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30919] Shared Array Memory Allocation Regression
Dimitar Tasev added the comment: Thank you, that is indeed the solution I ended up with, along with a large explanation of why it was necessary. Do you think that it's worth updating the `multiprocessing` documentation to make users aware of that behaviour? -- ___ Python tracker <http://bugs.python.org/issue30919> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com