Attached is a draft PEP on adding + and - operators to dict for
discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I
can do on Github (at least for now). If there's anyone who would like to
co-author and/or help with the process, that will be appreciated.
--
Steven
======================================
PEP-xxxx Dict addition and subtraction
======================================
**DRAFT** -- This is a draft document for discussion.
Abstract
--------
This PEP suggests adding merge ``+`` and difference ``-`` operators to the
built-in ``dict`` class.
The merge operator will have the same relationship to the ``dict.update``
method as the list concatenation operator has to ``list.extend``, with dict
difference being defined analogously.
Examples
--------
Dict addition will return a new dict containing the left operand merged with
the right operand.
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d + e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> e + d
{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}
The augmented assignment version operates in-place.
>>> d += e
>>> print(d)
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Analogously with list addition, the operator version is more restrictive, and
requires that both arguments are dicts, while the augmented assignment version
allows anything the ``update`` method allows, such as iterables of key/value
pairs.
>>> d + [('spam', 999)]
Traceback (most recent call last):
...
TypeError: can only merge dict (not "list") to dict
>>> d += [('spam', 999)]
>>> print(d)
{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Dict difference ``-`` will return a new dict containing the items from the left
operand which are not in the right operand.
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d - e
{'spam': 1, 'eggs': 2}
>>> e - d
{'aardvark': 'Ethel'}
Augmented assignment will operate in place.
>>> d -= e
>>> print(d)
{'spam': 1, 'eggs': 2}
Like the merge operator and list concatenation, the difference operator
requires both operands to be dicts, while the augmented version allows any
iterable of keys.
>>> d - {'spam', 'parrot'}
Traceback (most recent call last):
...
TypeError: cannot take the difference of dict and set
>>> d -= {'spam', 'parrot'}
>>> print(d)
{'eggs': 2, 'cheese': 'cheddar'}
>>> d -= [('spam', 999)]
>>> print(d)
{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
Semantics
---------
For the merge operator, if a key appears in both operands, the last-seen value
(i.e. that from the right-hand operand) wins. This shows that dict addition is
not commutative, in general ``d + e`` will not equal ``e + d``. This joins a
number of other non-commutative addition operators among the builtins,
including lists, tuples, strings and bytes.
Having the last-seen value wins makes the merge operator match the semantics of
the ``update`` method, so that ``d + e`` is an operator version of
``d.update(e)``.
The error messages shown above are not part of the API, and may change at any
time.
Rejected semantics
~~~~~~~~~~~~~~~~~~
Rejected alternatives semantics for ``d + e`` include:
- Add only new keys from ``e``, without overwriting existing keys in ``d``.
This may be done by reversing the operands ``e + d``, or using dict difference
first, ``d + (e - d)``. The later is especially useful for the in-place
version ``d += (e - d)``.
- Raise an exception if there are duplicate keys. This seems unnecessarily
restrictive and is not likely to be useful in practice. For example, updating
default configuration values with user-supplied values would most often fail
under the requirement that keys are unique::
prefs = site_defaults + user_defaults + document_prefs
- Add the values of d2 to the corresponding values of d1. This is the
behaviour implemented by ``collections.Counter``.
Syntax
------
An alternative to the ``+`` operator is the pipe ``|`` operator, which is used
for set union. This suggestion did not receive much support on Python-Ideas.
The ``+`` operator was strongly preferred on Python-Ideas.[1] It is more
familiar than the pipe operator, matches nicely with ``-`` as a pair, and the
Counter subclass already uses ``+`` for merging.
Current Alternatives
--------------------
To create a new dict containing the merged items of two (or more) dicts, one
can currently write::
{**d1, **d2}
but this is neither obvious nor easily discoverable. It is only guaranteed to
work if the keys are all strings. If the keys are not strings, it currently
works in CPython, but it may not work with other implementations, or future
versions of CPython[2].
It is also limited to returning a built-in dict, not a subclass, unless
re-written as ``MyDict(**d1, **d2)``, in which case non-string keys will raise
TypeError.
There is currently no way to perform dict subtraction except through a manual
loop.
Implementation
--------------
The implementation will be in C. (The author of this PEP would like to make it
known that he is not able to write the implemention.)
An approximate pure-Python implementation of the merge operator will be::
def __add__(self, other):
if isinstance(other, dict):
new = type(self)() # May be a subclass of dict.
new.update(self)
new.update(other)
return new
return NotImplemented
def __radd__(self, other):
if isinstance(other, dict):
new = type(other)()
new.update(other)
new.update(self)
return new
return NotImplemented
Note that the result type will be the type of the left operand; in the event of
matching keys, the winner is the right operand.
Augmented assignment will just call the ``update`` method. This is analogous to
the way ``list +=`` calls the ``extend`` method, which accepts any iterable,
not just lists.
def __iadd__(self, other):
self.update(other)
An approximate pure-Python implementation of the difference operator will be::
def __sub__(self, other):
if isinstance(other, dict):
new = type(self)()
for k in self:
if k not in other:
new[k] = self[k]
return new
return NotImplemented
def __rsub__(self, other):
if isinstance(other, dict):
new = type(other)()
for k in other:
if k not in self:
new[k] = other[k]
return new
return NotImplemented
Augmented assignment will operate on equivalent terms to ``update``. If the
operand has a key method, it will be used, otherwise the operand will be
iterated over::
def __isub__(self, other):
if hasattr(other, 'keys'):
for k in other.keys():
if k in self:
del self[k]
else:
for k in other:
if k in self:
del self[k]
These semantics are intended to match those of ``update`` as closely as
possible. For the dict built-in itself, calling ``keys`` is redundant as
iteration over a dict iterates over its keys; but for subclasses or other
mappings, ``update`` prefers to use the keys method.
.. attention:: The above paragraph may be inaccurate.
Although the dict docstring states that ``keys``
will be called if it exists, this does not seem to
be the case for dict subclasses. Bug or feature?
Contra-indications
------------------
(Or when to avoid using these new operators.)
For merging multiple dicts, the ``d1 + d2 + d3 + d4 + ...`` idiom will suffer
from the same unfortunate O(N\*\*2) Big Oh performance as does list and tuple
addition, and for similar reasons. If one expects to be merging a large number
of dicts where performance is an issue, it may be better to use an explicit
loop and in-place merging::
new = {}
for d in many_dicts:
new += d
This is unlikely to be a problem in practice as most uses of the merge operator
are expected to only involve a small number of dicts. Similarly, most uses of
list and tuple concatenation only use a few objects.
Using the dict augmented assignment operators on a dict inside a tuple (or
other immutable data structure) will lead to the same problem that occurs with
list concatenation[3], namely the in-place addition will succeed, but the
operation will raise an exception.
>>> a_tuple = ({'spam': 1, 'eggs': 2}, None)
>>> a_tuple[0] += {'spam': 999}
Traceback (most recent call last):
...
TypeError: 'tuple' object does not support item assignment
>>> a_tuple[0]
{'spam': 999, 'eggs': 2}
Similar remarks apply to the ``-`` operator.
Other discussions
-----------------
`Latest discussion which motivated this PEP
<https://mail.python.org/pipermail/python-ideas/2019-February/055509.html>`_
`Ticket on the bug tracker <https://bugs.python.org/issue36144>`_
`A previous discussion
<https://mail.python.org/pipermail/python-ideas/2015-February/031748.html>`_
and `commentary on it <https://lwn.net/Articles/635397/>`_. Note that the
author of this PEP was skeptical of this proposal at the time.
`How to merge dictionaries
<https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/>`_ in
idiomatic Python.
Open questions
--------------
Should these operators be part of the ABC ``Mapping`` API?
References
----------
[1] Guido's declaration that plus wins over pipe:
https://mail.python.org/pipermail/python-ideas/2019-February/055519.html
[2] Non-string keys: https://bugs.python.org/issue35105 and
https://mail.python.org/pipermail/python-dev/2018-October/155435.html
[3] Behaviour in tuples:
https://docs.python.org/3/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works
Copyright
---------
This document has been placed in the public domain.
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/