[Python-ideas] Re: pickle.reduce and deconstruct

Andrew Barnert via Python-ideas Sun, 09 Feb 2020 16:39:55 -0800

Daniel Spitz wrote:
>On Fri, Feb 7, 2020 at 1:51 PM Andrew Barnert [email protected] wrote:
> > That being said, if you have a use for the object graph walk, that
> > shouldn’t be too hard to write. It’s just that there
> > are nontrivial design
> > issues to answer. Do we need to expose the graph itself in some form, or
> > just the memoized walk iterator? Is an Iterator the right way to expose it,
> > as opposed to something like the Smalltalk-ish walker classes in the AST
> > module?
> (See my comments further down for more on this.) An entire graph structure
> isn't necessary for my use case. An iterator over values would be
> insufficient, because it wouldn't assist you in building up the replacement
> structure. An iterator over node objects could work but it might be weird
> to update them as you iterate. I think a walker is probably most
> appropriate.


I think the simplest interface might just look like `copy.deepcopy` but with an 
extra `transformer` argument, which is called on each value. The 
`ast.NodeTransformer` style doesn't really work, because you have to register 
transformers for an arbitrary set of types that probably need qualified names 
(or might not even have public names), rather than for a fixed set of types 
from the `ast` module. So you're going to need a single `transform` method that 
explicitly switches on types (or, maybe better, uses `@singledispatch` to do it 
for you), at which point the class is just extra boilerplate.

The first question is, should this be top-down (like doing `copy.deepcopy`, but 
calling `transformer` on each original value before recursively deep-copying 
it) or bottom-up (like doing `pickle.loads(pickle.dumps())`, but calling 
`transformer` on each reconstructed value)? I think bottom-up is more generally 
useful, but there are some cases where it seems wrong. For example, if you want 
to transform every `Spam` into an `Eggs`, and `Spam` isn't reducible, top-down 
could still easily work, but bottom-up wouldn't. I suppose it could be an 
option (like `os.walk`), or even give you both (like `ElementTree.iterparse`)?

Also, it seems a bit weird that types that are deepcopyable with `__deepcopy__` 
but not pickleable—or that are both, but `__deepcopy__` is a major 
optimization—can't use `__deepcopy__`. But I'm not sure there's any way we 
could use that protocol. The `__deepcopy__` method just takes `self, memo` and 
usually works by calling `copy.deepcopy(attr, memo)` on all of its attributes, 
and there's no obvious clean way to swizzle that into calling our 
transforming-deepcopy with our transformer instead. If it's important to make 
that work, then it's worth checking whether I'm right about that rather than 
just accepting that I didn't see an obvious clean way…

One more question: You don't want to see duplicate values. If a value has been 
transformed, you want the replacement value each time it shows up again then? 
Or should a transformed value not go into the memo cache, so if the same value 
appears again it can be transformed differently? Or does that also need to be 
an option?

Finally, I think we'd want the `deepcopy` style of memoization, where we just 
rely on `id` to be persistent through the entire copy (which includes keeping 
alive any values that might get replaced along the way), not the `pickle` style 
that has to construct and using persistent IDs. So we don't need to use any of 
the pickle ID machinery.

Anyway, if you wanted to implement this today, you'd need access to multiple 
(and undocumented, and periodically changing) things. But I think all we'd need 
`copy` to expose to make this doable without being fragile is:

    def reconstruct(x, memo, reduction, *, recurse=deepcopy):
        return _reconstruct(x, memo, *reduction, deepcopy=recurse)

    def copier(cls):
        if copier := _deepcopy_dispatch.get(cls):
            return copier
        if issubclass(cls, type):
            return _deepcopy_atomic

And add a `deepcopy=deepcopy` parameter to `_deepcopy_atomic` and 
`_deepcopy_method`, and the latter would have to use it instead of the global, 
just like the other copier functions. Or, maybe better, rename it to `recurse` 
in all of them.

And with those changes to `copy`, plus the `pickle.reduce` function I suggested 
earlier, I believe you can write `transform` yourself like this (quick&dirty 
hack based on existing `copy.deepcopy`, which assumes you want bottom-up, and 
transformed values to be memoized, and no attempt at `__deepcopy__` support):

    def transform(x, memo=None, *, transformer, _nil=[]):
        if memo is None:
            memo = {}

        recurse = functools.partial(transform, transformer=transformer)

        d = id(x)
        y = memo.get(d, _nil)
        if y is not _nil:
            return y

        # y = transform(y) # top-down walk; untested

        cls = type(x)

        if copier := copy.copier(cls):
            y = copier(x, memo, recurse=recurse)
        else:
            rv = pickle.reduce(x)
            # not sure what this is for, but copy.deepcopy does it...
            if isinstance(rv, str):
                y = x
            else:
                y = copy.reconstruct(x, memo, rv, recurse=recurse)

        y = transformer(y) # bottom-up walk

        # If is its own copy, don't memoize.
        if y is not x:
            memo[d] = y
            memo.setdefault(id(memo), []).append(x) # make sure x lives at 
least as long as d
        return y

I think this is sufficiently non-magical for someone to write as 
user/third-party-library code. 

And I don't think any of this puts unacceptable constraints on future versions 
of Python or other implementations.

The one possible issue is the `memo` thing. What if a future version of `copy` 
wanted to use something different from a plain dict for memoization, or wanted 
to use the dict differently in a way that our `setdefault` trick would break? I 
don't think that's an issue, but if it is, one more thing to expose:

    class Memo(dict):
        def keep_alive(self, x):
            self.setdefault(id(self), []).append(x)
            
And then the user `transform` function has to use `copy.Memo()` in place of 
`{}`, and call `memo.keep_alive(x)` instead of doing it manually.

Here's an example of using it:

    @dataclass
    class Spam:
        x: int
        y: float
        z: list

    @functools.singledispatch
    def transformator(x):
        return x

    @transformator.register
    def _(x: int):
        return str(x)

    spam = Spam(2, 3.0, z=[1, 2.0, 'abc'])
    print(transform(spam, transformer=transformator))

This should print:

    Spam(x='2', y=3.0, z=['1', 2.0, 'abc'])
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/WJGYQ7SRPFUJLWG57LW2V7CJAICOUW5X/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: pickle.reduce and deconstruct

Reply via email to