[Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

2016-10-21 Thread Nathaniel Smith
Hi all,

It's an old feature of the weakref API that you can define an
arbitrary callback to be invoked when the referenced object dies, and
that when this callback is invoked, it gets handed the weakref wrapper
object -- BUT, only after it's been cleared, so that the callback
can't access the originally referenced object. (I.e., this callback
will never raise: def callback(ref): assert ref() is None.)

AFAICT the original motivation for this seems was that if the weakref
callback could get at the object, then the weakref callback would
effectively be another finalizer like __del__, and finalizers and
reference cycles don't mix, so weakref callbacks can't be finalizers.
There's a long document from the 2.4 days about all the terrible
things that could happen if arbitrary code like callbacks could get
unfettered access to cyclic isolates at weakref cleanup time [1].

But that was 2.4. In the mean time, of course, PEP 442 fixed it so
that finalizers and weakrefs mix just fine. In fact, weakref callbacks
are now run *before* __del__ methods [2], so clearly it's now okay for
arbitrary code to touch the objects during that phase of the GC -- at
least in principle.

So what I'm wondering is, would anything terrible happen if we started
passing still-live weakrefs into weakref callbacks, and then clearing
them afterwards? (i.e. making step 1 of the PEP 442 cleanup order be
"run callbacks and then clear weakrefs", instead of the current "clear
weakrefs and then run callbacks"). I skimmed through the PEP 442
discussion, and AFAICT the rationale for keeping the old weakref
behavior was just that no-one could be bothered to mess with it [3].

[The motivation for my question is partly curiosity, and partly that
in the discussion about how to handle GC for async objects, it
occurred to me that it might be very nice if arbitrary classes that
needed access to the event loop during cleanup could do something like

  def __init__(self, ...):
  loop = asyncio.get_event_loop()
  loop.gc_register(self)

  # automatically called by the loop when I am GC'ed; async equivalent
of __del__
  async def aclose(self):
  ...

Right now something *sort* of like this is possible but it requires a
much more cumbersome API, where every class would have to implement
logic to fetch a cleanup callback from the loop, store it, and then
call it from its __del__ method -- like how PEP 525 does it. Delaying
weakref clearing would make this simpler API possible.]

-n

[1] https://github.com/python/cpython/blob/master/Modules/gc_weakref.txt
[2] https://www.python.org/dev/peps/pep-0442/#id7
[3] https://mail.python.org/pipermail/python-dev/2013-May/126592.html

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Have I got my hg dependencies correct?

2016-10-21 Thread Brett Cannon
On Thu, 20 Oct 2016 at 04:48 Skip Montanaro 
wrote:

> I've recently run into a problem building the math and cmath modules
> for 2.7. (I don't rebuild very often, so this problem might have been
> around for awhile.) My hg repos look like this:
>
> * My cpython repo pulls from https://hg.python.org/cpython
>
> * My 2.7 repo (and other non-tip repos) pulls from my cpython repo
>
> I think this setup was recommended way back in the day when hg was new
> to the Python toolchain to avoid unnecessary network bandwidth.
>
> So, if I execute
>
> hg pull
> hg update
>
> in first cpython, then 2.7 repos I should be up-to-date, correct?
>

Nope, you need to execute the same steps in your 2.7 checkout if you're
keeping it in a separate directory from your cpython repo that you're
referring to (you can also use `hg pull -u` to do the two steps above in a
single command).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2016-10-21 Thread Python tracker

ACTIVITY SUMMARY (2016-10-14 - 2016-10-21)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open5524 ( -3)
  closed 34728 (+55)
  total  40252 (+52)

Open issues with patches: 2398 


Issues opened (28)
==

#28404: Logging SyslogHandler not appending '\n' to the end
http://bugs.python.org/issue28404  reopened by elelement

#28437: Documentation for handling of non-type metaclass hints is uncl
http://bugs.python.org/issue28437  reopened by ncoghlan

#28445: Wrong documentation for GzipFile.peek
http://bugs.python.org/issue28445  opened by abacabadabacaba

#28446: pyvenv generates malformed hashbangs for scripts
http://bugs.python.org/issue28446  opened by alexreg

#28449: tarfile.open(mode = 'r:*', ignore_zeros = True) has 50% chance
http://bugs.python.org/issue28449  opened by Silver Fox

#28450: Misleading/inaccurate documentation about unknown escape seque
http://bugs.python.org/issue28450  opened by lelit

#28451: pydoc.safeimport() raises ErrorDuringImport() if __builtin__._
http://bugs.python.org/issue28451  opened by segfault87

#28453: SSLObject.selected_alpn_protocol() not documented
http://bugs.python.org/issue28453  opened by alex.gronholm

#28457: Make public the current private known hash functions in the C-
http://bugs.python.org/issue28457  opened by rhettinger

#28459: _pyio module broken on Cygwin / setmode not usable
http://bugs.python.org/issue28459  opened by erik.bray

#28460: Minidom, order of attributes, datachars
http://bugs.python.org/issue28460  opened by Petr Pulc

#28462: subprocess pipe can't see EOF from a child in case of a few ch
http://bugs.python.org/issue28462  opened by Vyacheslav Grigoryev

#28463: Email long headers parsing/serialization
http://bugs.python.org/issue28463  opened by Константин Волков

#28464: BaseEventLoop.close should shutdown executor before marking it
http://bugs.python.org/issue28464  opened by cmeyer

#28465: python 3.5 magic number
http://bugs.python.org/issue28465  opened by 曹忠

#28469: timeit: use powers of 2 in autorange(), instead of powers of 1
http://bugs.python.org/issue28469  opened by haypo

#28470: configure.ac -g debug compiler option when not Py_DEBUG
http://bugs.python.org/issue28470  opened by Chris Byers

#28474: WinError(): Python int too large to convert to C long
http://bugs.python.org/issue28474  opened by Kelvin You

#28475: Misleading error on random.sample when k < 0
http://bugs.python.org/issue28475  opened by franciscouzo

#28477: Add optional user argument to pathlib.Path.home()
http://bugs.python.org/issue28477  opened by josh.r

#28478: Built-in module 'time' does not enable functions if -Werror sp
http://bugs.python.org/issue28478  opened by toast12

#28482: test_typing fails if asyncio unavailable
http://bugs.python.org/issue28482  opened by martin.panter

#28485: compileall.compile_dir(workers=) does not raise Valu
http://bugs.python.org/issue28485  opened by martin.panter

#28488: shutil.make_archive (xxx, zip, root_dir) is adding './' entry 
http://bugs.python.org/issue28488  opened by bialix

#28489: Fix comment in tokenizer.c
http://bugs.python.org/issue28489  opened by Ryan.Gonzalez

#28491: Remove bundled libffi for OSX
http://bugs.python.org/issue28491  opened by zach.ware

#28494: is_zipfile false positives
http://bugs.python.org/issue28494  opened by Thomas.Waldmann

#28496: Mark up constants 0, 1, -1 in C API docs
http://bugs.python.org/issue28496  opened by serhiy.storchaka



Most recent 15 issues with no replies (15)
==

#28485: compileall.compile_dir(workers=) does not raise Valu
http://bugs.python.org/issue28485

#28470: configure.ac -g debug compiler option when not Py_DEBUG
http://bugs.python.org/issue28470

#28464: BaseEventLoop.close should shutdown executor before marking it
http://bugs.python.org/issue28464

#28460: Minidom, order of attributes, datachars
http://bugs.python.org/issue28460

#28457: Make public the current private known hash functions in the C-
http://bugs.python.org/issue28457

#28446: pyvenv generates malformed hashbangs for scripts
http://bugs.python.org/issue28446

#28439: Remove redundant checks in PyUnicode_EncodeLocale and PyUnicod
http://bugs.python.org/issue28439

#28429: ctypes fails to import with grsecurity's TPE
http://bugs.python.org/issue28429

#28422: multiprocessing Manager mutable type member access failure
http://bugs.python.org/issue28422

#28416: defining persistent_id in _pickle.Pickler subclass causes refe
http://bugs.python.org/issue28416

#28412: os.path.splitdrive documentation out of date
http://bugs.python.org/issue28412

#28408: Fix redundant code and memory leak in _PyUnicodeWriter_Finish
http://bugs.python.org/issue28408

#28407: Improve coverage of email.utils.make_msgid()
http://bugs.python.org/issue28407

#28401: Don't support the PEP384 stable

Re: [Python-Dev] Have I got my hg dependencies correct?

2016-10-21 Thread Skip Montanaro
On Fri, Oct 21, 2016 at 1:12 PM, Brett Cannon  wrote:
>> in first cpython, then 2.7 repos I should be up-to-date, correct?
>
>
> Nope, you need to execute the same steps in your 2.7 checkout

"repos" == "checkout" in my message.

So the hg up -C solved my problem, but I'm still a bit confused
(nothing new, in addition to which I only use hg for my Python
repositories)... Why didn't a plain "hg up" tell me it couldn't update
some files because of changes? Or, like git (I think), attempt to
incorporate the upstream changes, then leave conflict markers if that
failed?

Skip
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding bytes.frombuffer() constructor to PEP 467 (was: [Python-ideas] Adding bytes.frombuffer() constructor

2016-10-21 Thread Chris Barker
On Thu, Oct 20, 2016 at 11:48 PM, Nick Coghlan  wrote:

> > len(get_builtin_methods())
> >>230
> >
> > So what? No one looks in all the methods of builtins at once.
>
> Yes, Python implementation developers do, which is why it's a useful
> part of defining the overall "size" of Python and how that is growing
> over time.
>

sure -- but of course, the trick is that adding *one" new method is never a
big deal by itself.

I'm confused though -- IIUC, you are proposing adding a `iobuffers`
module to the std lib -- how is that not growing the "size" of Python?


I'm still confused about the "io" in "iobuffers" -- I've used buffers a lot
 -- for passing data around between various C libs -- numpy, image
processing, etc... I never really thought of it as IO though. which is why
a  simple frombuffer() seems to make a lot of sense to me, without any
other stuff. (to be honest, I reach for Cyton these days for that sort of
thing though)


>  and we

make it easier for educators to decide whether or not they should be
> introducing their students to the new capabilities.
>

advanced domain specific use cases (see
> http://learning-python.com/books/python-changes-2014-plus.html for one
> generalist author's perspective on the vast gulf that can arise
> between "What professional programmers want" and "What's relevant to
> new programmers")
>

thanks for the link -- I'll need to read the whole thing through -- though
from a glance, I have a slightly different perspective, as an educator as
well:

Python 3, in general, is harder to learn and less suited to scripting,
while potentially more suited to building larer systems.

I came to this conclusion last year when I converted my introductory class
to py3.

Some of it is the redundancy and whatnot talked about in that link --  yes,
those are issue or me. But more of it is real, maybe important change.
Interestingly, the biggest issue with the transition: Unicode, is one thing
that has made life much easier for newbies :-)

But the big ones are things like:

The more to be iterable focused rather than sequence focused -- iterables
really are harder to wrap one's head around when you are first learning.
And I was surprised at how often I had to wrap list() around stuff when
converting my examples and exercise solutions.

I've decided to teach the format() method for string formatting -- but it
is harder to wrap your head around as a newbie.

Even the extra parens in print() makes it a bit harder to script() well.

Use with: -- now I have to explain context managers before they can even
read a file.. (or gloss over it and jsut say " copy this code to open a
file"

Anyway, I've been meaning to write a Blog post about this, that would be
better formed, but you get the idea.

In short, I really appreciate the issues here -- though I really don't see
how adding one method to a fairily obscure builtin really applies -- this
is nothing like having three(!) ways to format strings.

Which is more comprehensible and discoverable, dict.setdefault(), or
> collections.defaultdict()?
>

Well, setdefault is Definitively more discoverable! not sure what your
point is.

As it happens, the homework for my intro class this week can greatly
benefit from setdefault() (or defaultdict() ) -- and in the last few years,
far fewer newbies have discovered defaultdict() for their solutions.
Empirical evidence for discoverability.

As for comprehensible -- I give a slight nod to .setdefault() - my solution
to the HW uses that. I can't say I have a strong argument as to why -- but
having (what looks like) a whole new class for this one extra feature seems
a bit odd, and makes one look carefully to see what else might be different
about it...


> Micro-optimisations like dict.setdefault() typically don't make sense
> in isolation - they only make sense in the context of a particular
> pattern of thought. Now, one approach to such patterns is to say "We
> just need to do a better job of teaching people to recognise and use
> the pattern!". This approach tends not to work very well - you're
> often better off extracting the entire pattern out to a higher level
> construct, giving that construct a name, and teaching that, and
> letting people worry about how it works internally later.
>

hmm -- maybe -- but to me, that example isn't really a pattern of thought
(to me) -- I actually remember my history of learning about setdefault(). I
found myself writing a bunch of code something like:

if key not in a_dict:
a_dict[key] = something
a_dict['key'].somethign_or_other()

Once I had written that code a few times, I thought: "There has got to be a
cleaner way to do this", looked at the dict methods and eventually found
setdefault() (took an embarrassingly long time). I did think -- "this has
got to be a common enough pattern to be somehow supported" but I will say
that it never, ever dawned on me to think: "this is got to be a common
enough pattern that someone would have made a special kind of dict

Re: [Python-Dev] Have I got my hg dependencies correct?

2016-10-21 Thread Terry Reedy

On 10/21/2016 2:12 PM, Brett Cannon wrote:



On Thu, 20 Oct 2016 at 04:48 Skip Montanaro mailto:[email protected]>> wrote:

I've recently run into a problem building the math and cmath modules
for 2.7. (I don't rebuild very often, so this problem might have been
around for awhile.) My hg repos look like this:

* My cpython repo pulls from https://hg.python.org/cpython

* My 2.7 repo (and other non-tip repos) pulls from my cpython repo

I think this setup was recommended way back in the day when hg was new
to the Python toolchain to avoid unnecessary network bandwidth.

So, if I execute

hg pull
hg update

in first cpython, then 2.7 repos I should be up-to-date, correct?


Nope, you need to execute the same steps in your 2.7 checkout if you're
keeping it in a separate directory from your cpython repo that you're
referring to


If the 2.7 repository shares the default repository, as described in the 
devguide, then only update is needed.  This has worked for me for at 
least two years.



--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Benchmarking Python and micro-optimizations

2016-10-21 Thread Nick Coghlan
On 20 October 2016 at 20:56, Victor Stinner  wrote:
> Hi,
>
> Last months, I worked a lot on benchmarks. I ran benchmarks, analyzed
> results in depth (up to the hardware and kernel drivers!), I wrote new
> tools and enhanced existing tools.

Thanks Victor, very cool work!

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

2016-10-21 Thread Nick Coghlan
On 21 October 2016 at 17:09, Nathaniel Smith  wrote:
> But that was 2.4. In the mean time, of course, PEP 442 fixed it so
> that finalizers and weakrefs mix just fine. In fact, weakref callbacks
> are now run *before* __del__ methods [2], so clearly it's now okay for
> arbitrary code to touch the objects during that phase of the GC -- at
> least in principle.
>
> So what I'm wondering is, would anything terrible happen if we started
> passing still-live weakrefs into weakref callbacks, and then clearing
> them afterwards?

The weakref-before-__del__ ordering change in
https://www.python.org/dev/peps/pep-0442/#disposal-of-cyclic-isolates
only applies to cyclic garbage collection,so for normal refcount
driven object cleanup in CPython, the __del__ still happens first:

>>> class C:
... def __del__(self):
... print("__del__ called")
...
>>> c = C()
>>> import weakref
>>> def cb():
... print("weakref callback called")
...
>>> weakref.finalize(c, cb)

>>> del c
__del__ called
weakref callback called

This means the main problem with a strong reference being reachable
from the weakref callback object remains: if the callback itself is
reachable, then the original object is reachable, and you don't have a
collectible cycle anymore.

>>> c = C()
>>> def cb2(obj):
... print("weakref callback called with object reference")
...
>>> weakref.finalize(c, cb2, c)

>>> del c
>>>

Changing that to support resurrecting the object so it can be passed
into the callback without the callback itself holding a strong
reference means losing the main "reasoning about software" benefit
that weakref callbacks offer: they currently can't resurrect the
object they relate to (since they never receive a strong reference to
it), so it nominally doesn't matter if the interpreter calls them
before or after that object has been entirely cleaned up.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Is there any remaining reason why weakref callbacks shouldn't be able to access the referenced object?

2016-10-21 Thread Nathaniel Smith
On Fri, Oct 21, 2016 at 8:32 PM, Nick Coghlan  wrote:
> On 21 October 2016 at 17:09, Nathaniel Smith  wrote:
>> But that was 2.4. In the mean time, of course, PEP 442 fixed it so
>> that finalizers and weakrefs mix just fine. In fact, weakref callbacks
>> are now run *before* __del__ methods [2], so clearly it's now okay for
>> arbitrary code to touch the objects during that phase of the GC -- at
>> least in principle.
>>
>> So what I'm wondering is, would anything terrible happen if we started
>> passing still-live weakrefs into weakref callbacks, and then clearing
>> them afterwards?
>
> The weakref-before-__del__ ordering change in
> https://www.python.org/dev/peps/pep-0442/#disposal-of-cyclic-isolates
> only applies to cyclic garbage collection,so for normal refcount
> driven object cleanup in CPython, the __del__ still happens first:
>
> >>> class C:
> ... def __del__(self):
> ... print("__del__ called")
> ...
> >>> c = C()
> >>> import weakref
> >>> def cb():
> ... print("weakref callback called")
> ...
> >>> weakref.finalize(c, cb)
> 
> >>> del c
> __del__ called
> weakref callback called

Ah, interesting! And in the old days this was of course the right way
to do it, because until __del__ has completed it's possible that the
object will get resurrected, and you don't want to clear the weakref
until you're certain that it's dead.

But PEP 442 already broke all that :-). Now weakref callbacks can
happen before __del__, and they can happen on objects that are about
to be resurrected. So if we wanted to pursue this then it seems like
it would make sense to standardize on the following sequence for
object teardown:

0) object becomes collectible (either refcount == 0 or it's part of a
cyclic isolate)
1) weakref callbacks fire
2) weakrefs are cleared (unconditionally, so we keep the rule that any
given weakref fires at most once, even if the object is resurrected)
3) if _PyGC_REFS_MASK_FINALIZED isn't set, __del__ fires, and then
_PyGC_REFS_MASK_FINALIZED is set
4) check for resurrection
5) deallocate the object

On further thought, this does still introduce one new edge case, which
is that even if we keep the guarantee that no individual weakref can
fire more than once, it's possible for *new* weakrefs to be registered
after resurrection, so it becomes possible for an object to be
resurrected multiple times. (Currently, resurrection can only happen
once, because __del__ is disabled on resurrected objects and weakrefs
can't resurrect at all.) I'm not actually sure that this is even a
problem, but in any case it's easy to fix by making a rule that you
can't take a weakref to an object whose _PyGC_REFS_MASK_FINALIZED flag
is already set, plus adjust the teardown sequence to be:

0) object becomes collectible (either refcount == 0 or it's part of a
cyclic isolate)
1) if _PyGC_REFS_MASK_FINALIZED is set, then go to step 7. Otherwise:
2) set _PyGC_REFS_MASK_FINALIZED
3) weakref callbacks fire
4) weakrefs are cleared (unconditionally)
5) __del__ fires
6) check for resurrection
7) deallocate the object

There remains one obscure corner case where multiple resurrection is
possible, because the resurrection-prevention flag doesn't exist on
non-GC objects, so you'd still be able to take new weakrefs to those.
But in that case __del__ can already do multiple resurrections, and
some fellow named Nick Coghlan seemed to think that was okay back in
2013 [1], so probably it's not too bad ;-).

[1] https://mail.python.org/pipermail/python-dev/2013-June/126850.html

> This means the main problem with a strong reference being reachable
> from the weakref callback object remains: if the callback itself is
> reachable, then the original object is reachable, and you don't have a
> collectible cycle anymore.
>
> >>> c = C()
> >>> def cb2(obj):
> ... print("weakref callback called with object reference")
> ...
> >>> weakref.finalize(c, cb2, c)
> 
> >>> del c
> >>>
>
> Changing that to support resurrecting the object so it can be passed
> into the callback without the callback itself holding a strong
> reference means losing the main "reasoning about software" benefit
> that weakref callbacks offer: they currently can't resurrect the
> object they relate to (since they never receive a strong reference to
> it), so it nominally doesn't matter if the interpreter calls them
> before or after that object has been entirely cleaned up.

I guess I'm missing the importance of this -- does the interpreter
gain some particular benefit from having flexibility about when to
fire weakref callbacks? Obviously it has to pick one in practice.

(The async use case that got me thinking about this is, of course,
exactly one where we would want a weakref callback to resurrect the
object it refers to. Only once, though.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list
Py

Re: [Python-Dev] Benchmarking Python and micro-optimizations

2016-10-21 Thread Victor Stinner
Hi,

I removed all old benchmarks results and I started to run manually
benchmarks. The timeline view is interesting to investigate
performance regression:
https://speed.python.org/timeline/#/?exe=3&ben=grid&env=1&revs=50&equid=off&quarts=on&extr=on

For example, it seems like call_method became slower between Oct 9 and
Oct 20: 35.9 ms => 59.9 ms:
https://speed.python.org/timeline/#/?exe=3&ben=call_method&env=1&revs=50&equid=off&quarts=on&extr=on

I don't know well the hardware of the benchmark runner, so maybe it's
an issue with the server running benchmarks?

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com