[Python-ideas] PEP: Dict addition and subtraction

Steven D'Aprano Fri, 01 Mar 2019 08:27:15 -0800

Attached is a draft PEP on adding + and - operators to dict for 
discussion.


This should probably go here:

https://github.com/python/peps

but due to technical difficulties at my end, I'm very limited in what I 
can do on Github (at least for now). If there's anyone who would like to 
co-author and/or help with the process, that will be appreciated.


-- 
Steven

======================================
PEP-xxxx Dict addition and subtraction
======================================

**DRAFT** -- This is a draft document for discussion.

Abstract
--------

This PEP suggests adding merge ``+`` and difference ``-`` operators to the 
built-in ``dict`` class.

The merge operator will have the same relationship to the ``dict.update`` 
method as the list concatenation operator has to ``list.extend``, with dict 
difference being defined analogously.


Examples
--------

Dict addition will return a new dict containing the left operand merged with 
the right operand.

    >>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
    >>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> d + e
    {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> e + d
    {'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}

The augmented assignment version operates in-place.

    >>> d += e
    >>> print(d)
    {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

Analogously with list addition, the operator version is more restrictive, and 
requires that both arguments are dicts, while the augmented assignment version 
allows anything the ``update`` method allows, such as iterables of key/value 
pairs.

    >>> d + [('spam', 999)]
    Traceback (most recent call last):
      ...
    TypeError: can only merge dict (not "list") to dict

    >>> d += [('spam', 999)]
    >>> print(d)
    {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}


Dict difference ``-`` will return a new dict containing the items from the left 
operand which are not in the right operand.

    >>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
    >>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
    >>> d - e
    {'spam': 1, 'eggs': 2}
    >>> e - d
    {'aardvark': 'Ethel'}

Augmented assignment will operate in place.

    >>> d -= e
    >>> print(d)
    {'spam': 1, 'eggs': 2}


Like the merge operator and list concatenation, the difference operator 
requires both operands to be dicts, while the augmented version allows any 
iterable of keys.

    >>> d - {'spam', 'parrot'}
    Traceback (most recent call last):
      ...
    TypeError: cannot take the difference of dict and set

    >>> d -= {'spam', 'parrot'}
    >>> print(d)
    {'eggs': 2, 'cheese': 'cheddar'}

    >>> d -= [('spam', 999)]
    >>> print(d)
    {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}


Semantics
---------

For the merge operator, if a key appears in both operands, the last-seen value 
(i.e. that from the right-hand operand) wins.  This shows that dict addition is 
not commutative, in general ``d + e`` will not equal ``e + d``.  This joins a 
number of other non-commutative addition operators among the builtins, 
including lists, tuples, strings and bytes.

Having the last-seen value wins makes the merge operator match the semantics of 
the ``update`` method, so that ``d + e`` is an operator version of 
``d.update(e)``.

The error messages shown above are not part of the API, and may change at any 
time.


Rejected semantics
~~~~~~~~~~~~~~~~~~

Rejected alternatives semantics for ``d + e`` include:

- Add only new keys from ``e``, without overwriting existing keys in ``d``.  
This may be done by reversing the operands  ``e + d``, or using dict difference 
first, ``d + (e - d)``.  The later is especially useful for the in-place 
version ``d += (e - d)``.

- Raise an exception if there are duplicate keys.  This seems unnecessarily 
restrictive and is not likely to be useful in practice.  For example, updating 
default configuration values with user-supplied values would most often fail 
under the requirement that keys are unique::

    prefs = site_defaults + user_defaults + document_prefs

- Add the values of d2 to the corresponding values of d1.  This is the 
behaviour implemented by ``collections.Counter``.


Syntax
------

An alternative to the ``+`` operator is the pipe ``|`` operator, which is used 
for set union.  This suggestion did not receive much support on Python-Ideas.

The ``+`` operator was strongly preferred on Python-Ideas.[1]  It is more 
familiar than the pipe operator, matches nicely with ``-`` as a pair, and the 
Counter subclass already uses ``+`` for merging.


Current Alternatives
--------------------

To create a new dict containing the merged items of two (or more) dicts, one 
can currently write::

    {**d1, **d2}

but this is neither obvious nor easily discoverable. It is only guaranteed to 
work if the keys are all strings. If the keys are not strings, it currently 
works in CPython, but it may not work with other implementations, or future 
versions of CPython[2].

It is also limited to returning a built-in dict, not a subclass, unless 
re-written as ``MyDict(**d1, **d2)``, in which case non-string keys will raise 
TypeError.

There is currently no way to perform dict subtraction except through a manual 
loop.


Implementation
--------------

The implementation will be in C.  (The author of this PEP would like to make it 
known that he is not able to write the implemention.)

An approximate pure-Python implementation of the merge operator will be::

    def __add__(self, other):
        if isinstance(other, dict):
            new = type(self)()  # May be a subclass of dict.
            new.update(self)
            new.update(other)
            return new
        return NotImplemented
    def __radd__(self, other):
        if isinstance(other, dict):
            new = type(other)()
            new.update(other)
            new.update(self)
            return new
        return NotImplemented

Note that the result type will be the type of the left operand; in the event of 
matching keys, the winner is the right operand.

Augmented assignment will just call the ``update`` method. This is analogous to 
the way ``list +=`` calls the ``extend`` method, which accepts any iterable, 
not just lists.

    def __iadd__(self, other):
        self.update(other)


An approximate pure-Python implementation of the difference operator will be::

    def __sub__(self, other):
        if isinstance(other, dict):
            new = type(self)()
            for k in self:
                if k not in other:
                    new[k] = self[k]
            return new
        return NotImplemented
    def __rsub__(self, other):
        if isinstance(other, dict):
            new = type(other)()
            for k in other:
                if k not in self:
                    new[k] = other[k]
            return new
        return NotImplemented

Augmented assignment will operate on equivalent terms to ``update``.  If the 
operand has a key method, it will be used, otherwise the operand will be 
iterated over::

    def __isub__(self, other):
        if hasattr(other, 'keys'):
            for k in other.keys():
                if k in self:
                    del self[k]
        else:
            for k in other:
                if k in self:
                    del self[k]


These semantics are intended to match those of ``update`` as closely as 
possible. For the dict built-in itself, calling ``keys`` is redundant as 
iteration over a dict iterates over its keys; but for subclasses or other 
mappings, ``update`` prefers to use the keys method.

  .. attention:: The above paragraph may be inaccurate.
     Although the dict docstring states that ``keys``
     will be called if it exists, this does not seem to
     be the case for dict subclasses.  Bug or feature?


Contra-indications
------------------

(Or when to avoid using these new operators.)

For merging multiple dicts, the ``d1 + d2 + d3 + d4 + ...`` idiom will suffer 
from the same unfortunate O(N\*\*2) Big Oh performance as does list and tuple 
addition, and for similar reasons.  If one expects to be merging a large number 
of dicts where performance is an issue, it may be better to use an explicit 
loop and in-place merging::

    new = {}
    for d in many_dicts:
        new += d

This is unlikely to be a problem in practice as most uses of the merge operator 
are expected to only involve a small number of dicts.  Similarly, most uses of 
list and tuple concatenation only use a few objects.

Using the dict augmented assignment operators on a dict inside a tuple (or 
other immutable data structure) will lead to the same problem that occurs with 
list concatenation[3], namely the in-place addition will succeed, but the 
operation will raise an exception.

    >>> a_tuple = ({'spam': 1, 'eggs': 2}, None)
    >>> a_tuple[0] += {'spam': 999}
    Traceback (most recent call last):
      ...
    TypeError: 'tuple' object does not support item assignment
    >>> a_tuple[0]
    {'spam': 999, 'eggs': 2}

Similar remarks apply to the ``-`` operator.


Other discussions
-----------------

`Latest discussion which motivated this PEP 
<https://mail.python.org/pipermail/python-ideas/2019-February/055509.html>`_

`Ticket on the bug tracker <https://bugs.python.org/issue36144>`_

`A previous discussion 
<https://mail.python.org/pipermail/python-ideas/2015-February/031748.html>`_ 
and `commentary on it <https://lwn.net/Articles/635397/>`_.  Note that the 
author of this PEP was skeptical of this proposal at the time.

`How to merge dictionaries 
<https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/>`_ in 
idiomatic Python.


Open questions
--------------

Should these operators be part of the ABC ``Mapping`` API?


References
----------

[1] Guido's declaration that plus wins over pipe: 
https://mail.python.org/pipermail/python-ideas/2019-February/055519.html

[2] Non-string keys: https://bugs.python.org/issue35105 and 
https://mail.python.org/pipermail/python-dev/2018-October/155435.html

[3] Behaviour in tuples: 
https://docs.python.org/3/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works


Copyright
---------

This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] PEP: Dict addition and subtraction

Reply via email to