date:20161022

Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-22 Thread Nick Coghlan

On 22 October 2016 at 07:57, Chris Barker  wrote:
> I'm still confused about the "io" in "iobuffers" -- I've used buffers a lot
> -- for passing data around between various C libs -- numpy, image
> processing, etc... I never really thought of it as IO though. which is why a
> simple frombuffer() seems to make a lot of sense to me, without any other
> stuff. (to be honest, I reach for Cyton these days for that sort of thing
> though)

That's the essence of my point though: if you care enough about the
performance of a piece of code for the hidden copy in
"bytes(mydata[start:stop])" to be deemed unacceptable, and also can't
afford the lazy cleanup of the view in
"bytes(memoryview(mydata)[start:stop])", then it seems likely that
you're writing specialist, high performance, low overhead, data
manipulation code, that probably shouldn't be written in Python

In such cases, an extension module written in something like Cython, C
or Rust would be a better fit, as using the more appropriate tool will
give you a range of additional performance improvements (near)
automatically, such as getting to avoid the runtime overhead of
Python's dynamic type system.

At that point, having to write the lowest-available-overhead version
explicitly in Python as:

with memoryview(mydata) as view:
return bytes(mydata[start:stop]

is a sign that someone is insisting on staying in pure Python code
when they're do sufficiently low level bit bashing that it probably
isn't the best idea to continue down that path.

>From that perspective, adding "[bytes/bytearray].frombuffer" is adding
complexity to the core language for the sake of giving people one
small additional piece of incremental performance improvement that
they can eke out before they admit to themselves "OK, I'm probably not
using the right language for this part of my application".

By contrast, a library that provided better low level data buffer
manipulation that was suitable for asyncio's needs is *much* easier to
emulate on older versions, and provides more scope for extracting
efficient data manipulation patterns beyond this one very specific
case of more efficiently snapshotting a subset of an existing buffer.

Cheers,
Nick.

P.S. I bring up Rust and the runtime overhead of the type system
specifically here, as Armin Ronacher recently wrote an excellent post
about that in relation to some performance improvement work they were
doing at Sentry:
https://blog.sentry.io/2016/10/19/fixing-python-performance-with-rust.html

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

2016-10-22 Thread Nick Coghlan

On 22 October 2016 at 16:05, Nathaniel Smith  wrote:
> On Fri, Oct 21, 2016 at 8:32 PM, Nick Coghlan  wrote:
> But PEP 442 already broke all that :-). Now weakref callbacks can
> happen before __del__, and they can happen on objects that are about
> to be resurrected.

Right, but the resurrection can still only happen *in* __del__, so the
interpreter doesn't need to deal with the case where it happens in a
weakref callback instead - that's where the freedom to do the
callbacks and the __del__ in either order comes from.

> There remains one obscure corner case where multiple resurrection is
> possible, because the resurrection-prevention flag doesn't exist on
> non-GC objects, so you'd still be able to take new weakrefs to those.
> But in that case __del__ can already do multiple resurrections, and
> some fellow named Nick Coghlan seemed to think that was okay back in
> 2013 [1], so probably it's not too bad ;-).
>
> [1] https://mail.python.org/pipermail/python-dev/2013-June/126850.html

Right, that still doesn't bother me.

>> Changing that to support resurrecting the object so it can be passed
>> into the callback without the callback itself holding a strong
>> reference means losing the main "reasoning about software" benefit
>> that weakref callbacks offer: they currently can't resurrect the
>> object they relate to (since they never receive a strong reference to
>> it), so it nominally doesn't matter if the interpreter calls them
>> before or after that object has been entirely cleaned up.
>
> I guess I'm missing the importance of this -- does the interpreter
> gain some particular benefit from having flexibility about when to
> fire weakref callbacks? Obviously it has to pick one in practice.

Sorry, my attempted clarification of one practical implication made it
look like I was defining the phrase I had in quotes. However, the
"reasoning about software" benefit I see is "If you don't define
__del__, you don't need to worry about object resurrection, as it's
categorically impossible when only using weakref callbacks".
Interpreter implementors are just one set of beneficiaries of that
simplification - everyone writing weakref callbacks qualifies as well.

However, if you're happy defining __del__ methods, then PEP 442 means
you can already inject lazy cyclic cleanup that supports resurrection:

>>> class Target:
... pass
...
>>> class Resurrector:
... def __init__(self, target):
... _self_ref = "_resurrector_{:d}".format(id(self))
... self.target = target
... setattr(target, _self_ref, self)
... def __del__(self):
... globals()["resurrected"] = self.target
...
>>> obj = Target()
>>> Resurrector(obj)
<__main__.Resurrector object at 0x7f42f8ae34e0>
>>> del obj
>>> resurrected
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'resurrected' is not defined
>>> import gc
>>> gc.collect(); gc.collect(); gc.collect()
6
4
0
>>> resurrected
<__main__.Target object at 0x7f42f8ae3438>

Given that, I don't see a lot of benefit in making weakref callbacks
harder to reason about when __del__ + attribute injection already
makes this possible.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

2016-10-22 Thread Nathaniel Smith

On Sat, Oct 22, 2016 at 3:01 AM, Nick Coghlan  wrote:
> On 22 October 2016 at 16:05, Nathaniel Smith  wrote:
>> On Fri, Oct 21, 2016 at 8:32 PM, Nick Coghlan  wrote:
>> But PEP 442 already broke all that :-). Now weakref callbacks can
>> happen before __del__, and they can happen on objects that are about
>> to be resurrected.
>
> Right, but the resurrection can still only happen *in* __del__, so the
> interpreter doesn't need to deal with the case where it happens in a
> weakref callback instead - that's where the freedom to do the
> callbacks and the __del__ in either order comes from.

I think we're probably on the same page here, but to be clear, my
point is that right now the resurrection logic seems to be (a) run
some arbitrary Python code (__del__), (b) run a second check to see if
a resurrection occurred (and the details of that check depend on
whether the object is part of a cyclic isolate). Since these two
phases are already decoupled from each other, it shouldn't cause any
particular difficulty for the interpreter if we add weakref callbacks
to the "run arbitrary code" phase. If we wanted to.

>> There remains one obscure corner case where multiple resurrection is
>> possible, because the resurrection-prevention flag doesn't exist on
>> non-GC objects, so you'd still be able to take new weakrefs to those.
>> But in that case __del__ can already do multiple resurrections, and
>> some fellow named Nick Coghlan seemed to think that was okay back in
>> 2013 [1], so probably it's not too bad ;-).
>>
>> [1] https://mail.python.org/pipermail/python-dev/2013-June/126850.html
>
> Right, that still doesn't bother me.
>
>>> Changing that to support resurrecting the object so it can be passed
>>> into the callback without the callback itself holding a strong
>>> reference means losing the main "reasoning about software" benefit
>>> that weakref callbacks offer: they currently can't resurrect the
>>> object they relate to (since they never receive a strong reference to
>>> it), so it nominally doesn't matter if the interpreter calls them
>>> before or after that object has been entirely cleaned up.
>>
>> I guess I'm missing the importance of this -- does the interpreter
>> gain some particular benefit from having flexibility about when to
>> fire weakref callbacks? Obviously it has to pick one in practice.
>
> Sorry, my attempted clarification of one practical implication made it
> look like I was defining the phrase I had in quotes. However, the
> "reasoning about software" benefit I see is "If you don't define
> __del__, you don't need to worry about object resurrection, as it's
> categorically impossible when only using weakref callbacks".
> Interpreter implementors are just one set of beneficiaries of that
> simplification - everyone writing weakref callbacks qualifies as well.

I do like invariants, but I'm having trouble seeing why this one is
super valuable. I mean, if your object doesn't define __del__, then
it's also impossible to distinguish between a weakref causing
resurrection and a strong reference that prevents the object from
being collected in the first place. And certainly it's harmless in the
use case I have in mind, where normally the weakref would be created
in the object's __init__ anyway :-).

> However, if you're happy defining __del__ methods, then PEP 442 means
> you can already inject lazy cyclic cleanup that supports resurrection:
>
> >>> class Target:
> ... pass
> ...
> >>> class Resurrector:
> ... def __init__(self, target):
> ... _self_ref = "_resurrector_{:d}".format(id(self))
> ... self.target = target
> ... setattr(target, _self_ref, self)
> ... def __del__(self):
> ... globals()["resurrected"] = self.target
> ...
> >>> obj = Target()
> >>> Resurrector(obj)
> <__main__.Resurrector object at 0x7f42f8ae34e0>
> >>> del obj
> >>> resurrected
> Traceback (most recent call last):
>   File "", line 1, in 
> NameError: name 'resurrected' is not defined
> >>> import gc
> >>> gc.collect(); gc.collect(); gc.collect()
> 6
> 4
> 0
> >>> resurrected
> <__main__.Target object at 0x7f42f8ae3438>
>
> Given that, I don't see a lot of benefit in making weakref callbacks
> harder to reason about when __del__ + attribute injection already
> makes this possible.

That's a cute trick :-). But it does have one major downside compared
to allowing weakref callbacks to access the object normally. With
weakrefs you don't interfere with when the object is normally
collected, and in particular for objects that aren't part of cycles,
they're still collected promptly (on CPython). Here every object
becomes part of a cycle, so objects that would otherwise be collected
promptly won't be.

(Remember that the reason I started thinking about this was that I was
wondering if we could have a nice API for the asyncio event loop to
"take over" the job of finalizing an o

Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

Re: [Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

Re: [Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

3 matches

Site Navigation

Mail list logo

Footer information