Daniel Spitz wrote:
>On Fri, Feb 7, 2020 at 1:51 PM Andrew Barnert [email protected] wrote:
> > That being said, if you have a use for the object graph walk, that
> > shouldn’t be too hard to write. It’s just that there
> > are nontrivial design
> > issues to answer. Do we need to expose the graph itself in some form, or
> > just the memoized walk iterator? Is an Iterator the right way to expose it,
> > as opposed to something like the Smalltalk-ish walker classes in the AST
> > module?
> (See my comments further down for more on this.) An entire graph structure
> isn't necessary for my use case. An iterator over values would be
> insufficient, because it wouldn't assist you in building up the replacement
> structure. An iterator over node objects could work but it might be weird
> to update them as you iterate. I think a walker is probably most
> appropriate.
I think the simplest interface might just look like `copy.deepcopy` but with an
extra `transformer` argument, which is called on each value. The
`ast.NodeTransformer` style doesn't really work, because you have to register
transformers for an arbitrary set of types that probably need qualified names
(or might not even have public names), rather than for a fixed set of types
from the `ast` module. So you're going to need a single `transform` method that
explicitly switches on types (or, maybe better, uses `@singledispatch` to do it
for you), at which point the class is just extra boilerplate.
The first question is, should this be top-down (like doing `copy.deepcopy`, but
calling `transformer` on each original value before recursively deep-copying
it) or bottom-up (like doing `pickle.loads(pickle.dumps())`, but calling
`transformer` on each reconstructed value)? I think bottom-up is more generally
useful, but there are some cases where it seems wrong. For example, if you want
to transform every `Spam` into an `Eggs`, and `Spam` isn't reducible, top-down
could still easily work, but bottom-up wouldn't. I suppose it could be an
option (like `os.walk`), or even give you both (like `ElementTree.iterparse`)?
Also, it seems a bit weird that types that are deepcopyable with `__deepcopy__`
but not pickleable—or that are both, but `__deepcopy__` is a major
optimization—can't use `__deepcopy__`. But I'm not sure there's any way we
could use that protocol. The `__deepcopy__` method just takes `self, memo` and
usually works by calling `copy.deepcopy(attr, memo)` on all of its attributes,
and there's no obvious clean way to swizzle that into calling our
transforming-deepcopy with our transformer instead. If it's important to make
that work, then it's worth checking whether I'm right about that rather than
just accepting that I didn't see an obvious clean way…
One more question: You don't want to see duplicate values. If a value has been
transformed, you want the replacement value each time it shows up again then?
Or should a transformed value not go into the memo cache, so if the same value
appears again it can be transformed differently? Or does that also need to be
an option?
Finally, I think we'd want the `deepcopy` style of memoization, where we just
rely on `id` to be persistent through the entire copy (which includes keeping
alive any values that might get replaced along the way), not the `pickle` style
that has to construct and using persistent IDs. So we don't need to use any of
the pickle ID machinery.
Anyway, if you wanted to implement this today, you'd need access to multiple
(and undocumented, and periodically changing) things. But I think all we'd need
`copy` to expose to make this doable without being fragile is:
def reconstruct(x, memo, reduction, *, recurse=deepcopy):
return _reconstruct(x, memo, *reduction, deepcopy=recurse)
def copier(cls):
if copier := _deepcopy_dispatch.get(cls):
return copier
if issubclass(cls, type):
return _deepcopy_atomic
And add a `deepcopy=deepcopy` parameter to `_deepcopy_atomic` and
`_deepcopy_method`, and the latter would have to use it instead of the global,
just like the other copier functions. Or, maybe better, rename it to `recurse`
in all of them.
And with those changes to `copy`, plus the `pickle.reduce` function I suggested
earlier, I believe you can write `transform` yourself like this (quick&dirty
hack based on existing `copy.deepcopy`, which assumes you want bottom-up, and
transformed values to be memoized, and no attempt at `__deepcopy__` support):
def transform(x, memo=None, *, transformer, _nil=[]):
if memo is None:
memo = {}
recurse = functools.partial(transform, transformer=transformer)
d = id(x)
y = memo.get(d, _nil)
if y is not _nil:
return y
# y = transform(y) # top-down walk; untested
cls = type(x)
if copier := copy.copier(cls):
y = copier(x, memo, recurse=recurse)
else:
rv = pickle.reduce(x)
# not sure what this is for, but copy.deepcopy does it...
if isinstance(rv, str):
y = x
else:
y = copy.reconstruct(x, memo, rv, recurse=recurse)
y = transformer(y) # bottom-up walk
# If is its own copy, don't memoize.
if y is not x:
memo[d] = y
memo.setdefault(id(memo), []).append(x) # make sure x lives at
least as long as d
return y
I think this is sufficiently non-magical for someone to write as
user/third-party-library code.
And I don't think any of this puts unacceptable constraints on future versions
of Python or other implementations.
The one possible issue is the `memo` thing. What if a future version of `copy`
wanted to use something different from a plain dict for memoization, or wanted
to use the dict differently in a way that our `setdefault` trick would break? I
don't think that's an issue, but if it is, one more thing to expose:
class Memo(dict):
def keep_alive(self, x):
self.setdefault(id(self), []).append(x)
And then the user `transform` function has to use `copy.Memo()` in place of
`{}`, and call `memo.keep_alive(x)` instead of doing it manually.
Here's an example of using it:
@dataclass
class Spam:
x: int
y: float
z: list
@functools.singledispatch
def transformator(x):
return x
@transformator.register
def _(x: int):
return str(x)
spam = Spam(2, 3.0, z=[1, 2.0, 'abc'])
print(transform(spam, transformer=transformator))
This should print:
Spam(x='2', y=3.0, z=['1', 2.0, 'abc'])
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/WJGYQ7SRPFUJLWG57LW2V7CJAICOUW5X/
Code of Conduct: http://python.org/psf/codeofconduct/