New submission from Davin Potts <pyt...@discontinuity.net>:

A facility for using shared memory would permit direct, zero-copy access to 
data across distinct processes (especially when created via multiprocessing) 
without the need for serialization, thus eliminating the primary performance 
bottleneck in the most common use cases for multiprocessing.

Currently, multiprocessing communicates data from one process to another by 
first serializing it (by default via pickle) on the sender's end then 
de-serializing it on the receiver's end.  Because distinct processes possess 
their own process memory space, no data in memory is common across processes 
and thus any information to be shared must be communicated over a 
socket/pipe/other mechanism.  Serialization via tools like pickle is convenient 
especially when supporting processes on physically distinct hardware with 
potentially different architectures (which multiprocessing does also support).  
Such serialization is wasteful and potentially unnecessary when multiple 
multiprocessing.Process instances are running on the same machine.  The cost of 
this serialization is believed to be a non-trivial drag on performance when 
using multiprocessing on multi-core and/or SMP machines.

While not a new concept (System V Shared Memory has been around for quite some 
time), the proliferation of support for shared memory segments on modern 
operating systems (Windows, Linux, *BSDs, and more) provides a means for 
exposing a consistent interface and api to a shared memory construct usable 
across platforms despite technical differences in the underlying implementation 
details of POSIX shared memory versus Native Shared Memory (Windows).

For further reading/reference:  Tools such as the posix_ipc module have 
provided fairly mature apis around POSIX shared memory and seen use in other 
projects.  The "shared-array", "shared_ndarray", and "sharedmem-numpy" packages 
all have interesting implementations for exposing NumPy arrays via shared 
memory segments.  PostgreSQL has a consistent internal API for offering shared 
memory across Windows/Unix platforms based on System V, enabling use on 
NetBSD/OpenBSD before those platforms supported POSIX shared memory.

At least initially, objects which support the buffer protocol can be most 
readily shared across processes via shared memory.  From a design standpoint, 
the use of a Manager instance is likely recommended to enforce access rules in 
different processes via proxy objects as well as cleanup of shared memory 
segments once an object is no longer referenced.  The documentation around 
multiprocessing's existing sharedctypes submodule (which uses a single  memory 
segment through the heap submodule with its own memory management 
implementation to "malloc" space for allowed ctypes and then "free" that space 
when no longer used, recycling it for use again from the shared memory segment) 
will need to be updated to avoid confusion over concepts.

Ultimately, the primary motivation is to provide a path for better parallel 
execution performance by eliminating the need to transmit data between distinct 
processes on a single system (not for use in distributed memory architectures). 
 Secondary use cases have been suggested including a means for sharing data 
across concurrent Python interactive shells, potential use with 
subinterpreters, and other traditional uses for shared memory since the first 
introduction of System V Shared Memory onwards.

----------
assignee: davin
components: Library (Lib)
messages: 334278
nosy: davin, eric.snow, lukasz.langa, ned.deily, rhettinger, yselivanov
priority: normal
severity: normal
status: open
title: shared memory construct to avoid need for serialization between processes
type: enhancement
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35813>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to