[Python-Dev] PEP 509: Add a private version to dict (version 3)

2016-04-19 Thread Victor Stinner
Hi,

Below if the third version of my PEP 509 (dict version).

Changes since the version 2:

* __setitem__() and update() now always increases the version: remove
the micro-optimization on "dict[key] is new_value". Exception: version
is not changed with dict.update() is called without argument.
* be more explict on version++: explain that the operation must be
atomic, and that dict methods are already atomic thanks to the GIL
* Usage of the dict version: add Cython
* "Guard against changing dict during iteration": don't guess if the
new dict version can be used or not. Let's discuss that later.
* rephrase/complete some sections
* add links to new threads on python-dev

I hope that I addressed all Jim's concerns about the version 2.

Note: I also updated the implementation. The implementation now
contains more tests for identical values and more tests on equal
values.

HTML version:
https://www.python.org/dev/peps/pep-0509/


PEP: 509
Title: Add a private version to dict
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 4-January-2016
Python-Version: 3.6


Abstract


Add a new private version to the builtin ``dict`` type, incremented at
each dictionary creation and at each dictionary change, to implement
fast guards on namespaces.


Rationale
=

In Python, the builtin ``dict`` type is used by many instructions. For
example, the ``LOAD_GLOBAL`` instruction looks up a variable in the
global namespace, or in the builtins namespace (two dict lookups).
Python uses ``dict`` for the builtins namespace, globals namespace, type
namespaces, instance namespaces, etc. The local namespace (function
namespace) is usually optimized to an array, but it can be a dict too.

Python is hard to optimize because almost everything is mutable: builtin
functions, function code, global variables, local variables, ... can be
modified at runtime. Implementing optimizations respecting the Python
semantics requires to detect when "something changes": we will call
these checks "guards".

The speedup of optimizations depends on the speed of guard checks. This
PEP proposes to add a private version to dictionaries to implement fast
guards on namespaces.

Dictionary lookups can be skipped if the version does not change which
is the common case for most namespaces. The version is globally unique,
so checking the version is also enough to check if the namespace
dictionary was not replaced with a new dictionary.

When the dictionary version does not change, the performance of a guard
does not depend on the number of watched dictionary entries: the
complexity is O(1).

Example of optimization: copy the value of a global variable to function
constants.  This optimization requires a guard on the global variable to
check if it was modified. If the global variable is not modified, the
function uses the cached copy. If the global variable is modified, the
function uses a regular lookup, and maybe also deoptimize the function
(to remove the overhead of the guard check for next function calls).

See the `PEP 510 -- Specialized functions with guards
`_ for the concrete usage of
guards to specialize functions and for a more general rationale on
Python static optimizers.


Guard example
=

Pseudo-code of an fast guard to check if a dictionary entry was modified
(created, updated or deleted) using an hypothetical
``dict_get_version(dict)`` function::

UNSET = object()

class GuardDictKey:
def __init__(self, dict, key):
self.dict = dict
self.key = key
self.value = dict.get(key, UNSET)
self.version = dict_get_version(dict)

def check(self):
"""Return True if the dictionary entry did not change
and the dictionary was not replaced."""

# read the version of the dictionary
version = dict_get_version(self.dict)
if version == self.version:
# Fast-path: dictionary lookup avoided
return True

# lookup in the dictionary
value = self.dict.get(self.key, UNSET)
if value is self.value:
# another key was modified:
# cache the new dictionary version
self.version = version
return True

# the key was modified
return False


Usage of the dict version
=

Speedup method calls


Yury Selivanov wrote a `patch to optimize method calls
`_. The patch depends on the
`"implement per-opcode cache in ceval"
`_ patch which requires dictionary
versions to invalidate the cache if the globals dictionary or the
builtins dictionary has been modified.

The cache also requires that the dictionary version is globally unique.
It is possible to d

Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Brett Cannon writes:
 > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull  wrote:

 > Well, it makes *your* head hurt;

It doesn't, because I have a different (and IMHO better) model.  I can
interpret yours without pain by comparing to that.

 > By providing os.fspath() I can say that I do not, under any
 > circumstances, want someone to guess at the encoding some bytes
 > path is under to get me a string and instead I want to start and
 > end entirely in a world of strings. IOW os.fspath() lets me work in
 > such a way that the instant bytes are introduced into my code for
 > file paths it triggers a TypeError.

Does it really help you work that way?  open is polymorphic, and will
use os._raw_fspath(obj, (bytes,str)).  Ditto os.scandir etc.  If they
don't, there's no point in supporting bytes returns from __fspath__,
is there?  Application code will normally not be calling os.fspath.
In the future, pathlib will, I suppose, but even without os.fspath
pathlib already protects you, as does antipathy.[1]

More effective, then, is just to use pathlib for your Path-hacking
work as soon as the path-representing object appears, and Path will
complain about bytes for you.  This is an analogue of the "decode
bytes at the boundary" principle.

 > Yep, we are stuck with the names unless you want to propose a new
 > name and deprecate the old one.

I already proposed fs_ensure_bytes and fs_ensure_str.  I think they're
sufficiently ugly to prove my point.


Footnotes: 
[1]  Strictly speaking, antipathy protects you from inadvertant mixing
of bytes and str.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Ethan Furman writes:
 > On 04/18/2016 12:25 PM, Stephen J. Turnbull wrote:
 > > Koos Zevenhoven writes:
 > 
 > >> After all, we want something that's *almost* exclusively str.
 > >
 > > But we don't want that, AFAICT.  Some clearly want this API to be
 > > unbiased against bytes in the same way the os APIs are unbiased,
 > > because that's what we've got in the current proposal.
 > 
 > Are we reading the same thread?  For my last several replies I am
 > very biased against bytes (and I know I'm not the only one).

I'm not "reinterpreting" what people *write*, I'm looking at *the APIs
they propose and advocate*.  As I wrote, and you quoted.

Except for the original proposal that only supported pathlib.Path, the
facilities advocated are actually unbiased.  It's just as easy to use
bytes as str, but it's proposed not to advertise that fact.  So what?
A 'my.fspath' is trivial to write, and hard to get wrong AFAICS.

Consider a truly biased alternative: __fspath__ of types like DirEntry
would return self when bytes-oriented.  (This addresses the issue of
__fspath__ that coerces to str becoming a timebomb in bytes apps.)
bytes-oriented applications would have to use DirEntry.path.  No
visible difference from now (you get the same API for bytes and the
same TypeError from open), and no loss, except for str-envy.  So use
str!  Why isn't that acceptable to you?  Maybe even TOOWTDI?

I really want to know.  I'm not 100% sure that's the right way to go,
mostly because Nick and Brett are signed up for polymorphism.  But I
sure haven't seen any explicit arguments for polymorphism, though I've
asked for them.  AFAICS, everybody just assumed that because some
related APIs are polymorphic, this one should be, too, and dove into
the problem of how to make a polymorphic API safe for Python 3.

 > If the client says "I'm okay with either" then I fully expect the
 > client to have code to properly handle str vs bytes after the
 > fspath (or whatever it's called) call.

I would too, but, uh, examples of such clients?  And no, antipathy
isn't an example -- it doesn't consume bytes, it passes them through
to the kind of client I want to hear about.

AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
that actually wants it.

Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Koos Zevenhoven
On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull  wrote:
>
> AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
> that actually wants it.

It might be, but as long as bytes paths are supported polymorphicly
all over the stdlib, we won't get rid of supporting bytes paths. So
are you proposing to deprecate bytes paths?

-Koos
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 509: Add a private version to dict (version 3)

2016-04-19 Thread Victor Stinner
Hi,

> Backwards Compatibility
> ===
>
> Since the ``PyDictObject`` structure is not part of the stable ABI and
> the new dictionary version not exposed at the Python scope, changes are
> backward compatible.

My current implementation inserts the new ma_version_tag field in the
middle of the PyDictObject structure, so it obviously changes the ABI.

Can someone please confirm (double check) that the PyDictObject
structure is explicitly excluded from the stable ABI? I'm talking
about about the "#ifndef Py_LIMITED_API" in Include/dictobject.h.

I understood what is an ABI in the hard way. When I ran the perf.py
benchmark, I got a crash in ctypes on django_v3. The ctypes module
uses a C type which inherits from the dict type. I compiled Python
with and without my patch in the same directory and then I renamed the
./python binary, but the _ctypes.so was shared between the two
binaries.

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Nick Coghlan
On 19 April 2016 at 21:55, Stephen J. Turnbull  wrote:

> I really want to know.  I'm not 100% sure that's the right way to go,
> mostly because Nick and Brett are signed up for polymorphism.  But I
> sure haven't seen any explicit arguments for polymorphism, though I've
> asked for them.  AFAICS, everybody just assumed that because some
> related APIs are polymorphic, this one should be, too, and dove into
> the problem of how to make a polymorphic API safe for Python 3.
>

In my case, it's ~5 years of peripheral involvement in porting the Fedora
ecosystem to Python 3. I haven't personally done that much of the actual
porting work, but I've spent plenty of time talking to the folks that are,
and tweaking various things to make their lives easier where I could make
the case that there was either a benefit to Python 3, or at least no harm
to it.

The gist of the motivation for bytes/str polymorphism here is similar to
that for restoring __mod__ polymorphism in
https://www.python.org/dev/peps/pep-0461/: the bytes/str duality is as much
a fact of life when dealing with OS interfaces as it is when dealing with
wire protocols, so if __fspath__ is polymorphic, then it's easier for
compatibility modules like six and future to define their own "fspath"
helper functions that work on both Python 2 and Python 3 across all
supported platforms.

This is also why I ended up proposing pushing the complexity down into a
documented-but-underscore-prefixed API: folks writing pure Python 3
application code *really* shouldn't need to worry about the bytes support
in the protocol, but for operating system level use cases, not having it
readily available to 2/3 compatible Python code would be a pain.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Dependent packages not listed on PyPI

2016-04-19 Thread Andreas Maier

Hi,
I have a package "pywbem" which in its setup script specifies a number 
of dependent packages via "install_requires".


I should also say that it extends setuptools/distutils with its own 
additional keywords, e.g. it adds a "develop_requires", but I believe 
(hope) that is irrelevant for my problem.


In pywbem 0.8.3, the dependencies are:

args = {
...,
'install_requires': [
'six',
'ply',
],
...,
}

and when running on Python 2.x, an additional one is added, dependent on 
the OS platform and bit size:


if sys.version_info[0] == 2:
if platform.system() == 'Windows':
if platform.architecture()[0] == '64bit':
m2crypto_req = 'M2CryptoWin64>=0.21'
else:
m2crypto_req = 'M2CryptoWin32>=0.21'
else:
m2crypto_req = 'M2Crypto>=0.24'
args['install_requires'] += [
m2crypto_req,
]

The problem is that the pywbem package on PyPI does not show these 
dependencies: https://pypi.python.org/pypi/pywbem/0.8.3


I wonder whether this is the reason for a particular installation 
problem we have seen (https://github.com/pywbem/pywbem/issues/113).


I do see other projects on PyPI, that show the dependencies they specify 
in their setup scripts, on their PyPI package page in a "*Requires 
Distributions*" section:


* https://pypi.python.org/pypi/bandit/0.17.3
* https://pypi.python.org/pypi/json-spec/0.9.14

Many others also do not have their dependencies shown, including six, 
pbr, PyYAML, lxml, to name just a few.


So far, I was unable to find out what the presence or absence of that 
information is related to, in the source of the project.


Here are my questions:

1. What causes the "Requires Distributions" section on a PyPI package 
page to show up there?


2. Is it important to show up there (e.g. for some tools)?

Andy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dependent packages not listed on PyPI

2016-04-19 Thread Brett Cannon
Questions about PyPI should be directed at the distutils-sig mailing list.

On Tue, 19 Apr 2016 at 08:12 Andreas Maier  wrote:

> Hi,
> I have a package "pywbem" which in its setup script specifies a number of
> dependent packages via "install_requires".
>
> I should also say that it extends setuptools/distutils with its own
> additional keywords, e.g. it adds a "develop_requires", but I believe
> (hope) that is irrelevant for my problem.
>
> In pywbem 0.8.3, the dependencies are:
>
> args = {
> ...,
> 'install_requires': [
> 'six',
> 'ply',
> ],
> ...,
> }
>
> and when running on Python 2.x, an additional one is added, dependent on
> the OS platform and bit size:
>
> if sys.version_info[0] == 2:
> if platform.system() == 'Windows':
> if platform.architecture()[0] == '64bit':
> m2crypto_req = 'M2CryptoWin64>=0.21'
> else:
> m2crypto_req = 'M2CryptoWin32>=0.21'
> else:
> m2crypto_req = 'M2Crypto>=0.24'
> args['install_requires'] += [
> m2crypto_req,
> ]
>
> The problem is that the pywbem package on PyPI does not show these
> dependencies: https://pypi.python.org/pypi/pywbem/0.8.3
>
> I wonder whether this is the reason for a particular installation problem
> we have seen (https://github.com/pywbem/pywbem/issues/113).
>
> I do see other projects on PyPI, that show the dependencies they specify
> in their setup scripts, on their PyPI package page in a "*Requires
> Distributions*" section:
>
> * https://pypi.python.org/pypi/bandit/0.17.3
> * https://pypi.python.org/pypi/json-spec/0.9.14
>
> Many others also do not have their dependencies shown, including six, pbr,
> PyYAML, lxml, to name just a few.
>
> So far, I was unable to find out what the presence or absence of that
> information is related to, in the source of the project.
>
> Here are my questions:
>
> 1. What causes the "Requires Distributions" section on a PyPI package
> page to show up there?
>
> 2. Is it important to show up there (e.g. for some tools)?
>
> Andy
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Brett Cannon
On Tue, 19 Apr 2016 at 04:46 Stephen J. Turnbull  wrote:

> Brett Cannon writes:
>  > On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull 
> wrote:
>
>  > Well, it makes *your* head hurt;
>
> It doesn't, because I have a different (and IMHO better) model.  I can
> interpret yours without pain by comparing to that.
>
>  > By providing os.fspath() I can say that I do not, under any
>  > circumstances, want someone to guess at the encoding some bytes
>  > path is under to get me a string and instead I want to start and
>  > end entirely in a world of strings. IOW os.fspath() lets me work in
>  > such a way that the instant bytes are introduced into my code for
>  > file paths it triggers a TypeError.
>
> Does it really help you work that way?  open is polymorphic, and will
> use os._raw_fspath(obj, (bytes,str)).  Ditto os.scandir etc.  If they
> don't, there's no point in supporting bytes returns from __fspath__,
> is there?


You're leaving out all of the os.path functions, but you're right that if
they didn't support it like Windows then this entire discussion of bytes
paths would be moot.


>   Application code will normally not be calling os.fspath.
> In the future, pathlib will, I suppose, but even without os.fspath
> pathlib already protects you, as does antipathy.[1]
>

I disagree that application code won't be calling os.fspath.


>
> More effective, then, is just to use pathlib for your Path-hacking
> work as soon as the path-representing object appears, and Path will
> complain about bytes for you.  This is an analogue of the "decode
> bytes at the boundary" principle.
>

Ah, but you see that doesn't make porting easy. If I have a bunch of
path-manipulating code using os.path already and I want to add support for
pathlib I can either (a) rewrite all of that path-manipulating code to work
using pathlib, or (b) simply call `path = os.fspath(path)` and be done with
it. Basically if you have written any code that uses os.path then you will
have to care about (a) or (b) as a way to add support for pathlib short of
the `str(path)` hack we're all working to get away from. And if people
truly liked option (a) then this conversation wouldn't be such a big deal
as we would have seen more people using pathlib already (yes, the
provisional tag may have scared some off, but my guess is it's more from
not wanting to rewrite os.path-using code).

Now if you can convince me that the use of bytes paths is very minimal and
thus people doing path manipulations with them will be a very small
minority then I'm happy to try and use this to keep pushing people towards
avoiding bytes for file paths. But over the years people such as yourself,
Stephen, have convinced me that people do some really crazy stuff with
their file systems and that it isn't isolated to just one or two people.
And so it becomes this situation where we need to ask ourselves if we are
going to tell them to just deal with it or help them transition.

The other way to convince me is that people needing to support older
versions of Python will use `path = path.__fspath__() if hasattr(path,
'__fspath__') else path` and that allowing bytes with that idiom is going
to cost them dearly. My current assumption is that it won't because people
using that idiom are using os.path and those functions will complain when
mixing str and bytes together, but I'm open to being convinced otherwise.

I guess what I'm trying to get at is that I understand the desire to get
people to get the bytes path habit, but to me the best way will be to get
people quickly and easily transitioned over to pathlib as a carrot rather
than using the lack of bytes path support in this transition as a stick.

-Brett



>
>  > Yep, we are stuck with the names unless you want to propose a new
>  > name and deprecate the old one.
>
> I already proposed fs_ensure_bytes and fs_ensure_str.  I think they're
> sufficiently ugly to prove my point.
>
>
> Footnotes:
> [1]  Strictly speaking, antipathy protects you from inadvertant mixing
> of bytes and str.
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Eric Snow
On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon  wrote:
> Ah, but you see that doesn't make porting easy. If I have a bunch of
> path-manipulating code using os.path already and I want to add support for
> pathlib I can either (a) rewrite all of that path-manipulating code to work
> using pathlib, or (b) simply call `path = os.fspath(path)` and be done with
> it. Basically if you have written any code that uses os.path then you will
> have to care about (a) or (b) as a way to add support for pathlib short of
> the `str(path)` hack we're all working to get away from. And if people truly
> liked option (a) then this conversation wouldn't be such a big deal as we
> would have seen more people using pathlib already (yes, the provisional tag
> may have scared some off, but my guess is it's more from not wanting to
> rewrite os.path-using code).
>
> Now if you can convince me that the use of bytes paths is very minimal and
> thus people doing path manipulations with them will be a very small minority
> then I'm happy to try and use this to keep pushing people towards avoiding
> bytes for file paths. But over the years people such as yourself, Stephen,
> have convinced me that people do some really crazy stuff with their file
> systems and that it isn't isolated to just one or two people. And so it
> becomes this situation where we need to ask ourselves if we are going to
> tell them to just deal with it or help them transition.
>
> The other way to convince me is that people needing to support older
> versions of Python will use `path = path.__fspath__() if hasattr(path,
> '__fspath__') else path` and that allowing bytes with that idiom is going to
> cost them dearly. My current assumption is that it won't because people
> using that idiom are using os.path and those functions will complain when
> mixing str and bytes together, but I'm open to being convinced otherwise.
>
> I guess what I'm trying to get at is that I understand the desire to get
> people to get the bytes path habit, but to me the best way will be to get
> people quickly and easily transitioned over to pathlib as a carrot rather
> than using the lack of bytes path support in this transition as a stick.

Perhaps I missed previous discussion on the point, but why not support
both __fspath__() -> str and __fssyspath__() -> bytes?  Returning
NotImplemented would indicate "try the other one".  For example,
DirEntry.__fspath__() would return NotImplemented when the underlying
value is bytes and vice-versa.

A str-specific os.fspath would looks something like this:

def fspath(path):
try:
fspath = type(path).__fspath__
except AttributeError:
pass
else:
rendered = fspath(path)
if rendered is not NotImplemented:
return rendered
raise TypeError

...and a more lenient, polymorphic version (for use by os.path.*,
etc.) would look like this:

def _fspath(path):
try:
fspath = type(path).__fspath__
except AttributeError:
pass
else:
rendered = fspath(path)
if rendered is not NotImplemented:
return rendered

   try:
fspath = type(path).__fssyspath__
except AttributeError:
pass
else:
rendered = fspath(path)
if rendered is not NotImplemented:
return rendered

# nothing to do
return path

The hard distinction between the two dunder methods preserves the
conceptual str/bytes division we're aiming for.  It will be much
easier to identify which path implementations are dealing with (or
supporting) bytes paths.  Likewise with the two helpers and their
usage.

-eric
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Brett Cannon
On Tue, 19 Apr 2016 at 15:22 Eric Snow  wrote:

> On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon  wrote:
> > Ah, but you see that doesn't make porting easy. If I have a bunch of
> > path-manipulating code using os.path already and I want to add support
> for
> > pathlib I can either (a) rewrite all of that path-manipulating code to
> work
> > using pathlib, or (b) simply call `path = os.fspath(path)` and be done
> with
> > it. Basically if you have written any code that uses os.path then you
> will
> > have to care about (a) or (b) as a way to add support for pathlib short
> of
> > the `str(path)` hack we're all working to get away from. And if people
> truly
> > liked option (a) then this conversation wouldn't be such a big deal as we
> > would have seen more people using pathlib already (yes, the provisional
> tag
> > may have scared some off, but my guess is it's more from not wanting to
> > rewrite os.path-using code).
> >
> > Now if you can convince me that the use of bytes paths is very minimal
> and
> > thus people doing path manipulations with them will be a very small
> minority
> > then I'm happy to try and use this to keep pushing people towards
> avoiding
> > bytes for file paths. But over the years people such as yourself,
> Stephen,
> > have convinced me that people do some really crazy stuff with their file
> > systems and that it isn't isolated to just one or two people. And so it
> > becomes this situation where we need to ask ourselves if we are going to
> > tell them to just deal with it or help them transition.
> >
> > The other way to convince me is that people needing to support older
> > versions of Python will use `path = path.__fspath__() if hasattr(path,
> > '__fspath__') else path` and that allowing bytes with that idiom is
> going to
> > cost them dearly. My current assumption is that it won't because people
> > using that idiom are using os.path and those functions will complain when
> > mixing str and bytes together, but I'm open to being convinced otherwise.
> >
> > I guess what I'm trying to get at is that I understand the desire to get
> > people to get the bytes path habit, but to me the best way will be to get
> > people quickly and easily transitioned over to pathlib as a carrot rather
> > than using the lack of bytes path support in this transition as a stick.
>
> Perhaps I missed previous discussion on the point, but why not support
> both __fspath__() -> str and __fssyspath__() -> bytes?  Returning
> NotImplemented would indicate "try the other one".  For example,
> DirEntry.__fspath__() would return NotImplemented when the underlying
> value is bytes and vice-versa.
>

It was deemed more complexity than necessary for the protocol to have two
functions. Either __fspath__ will be polymorphic or it will only return str.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Modify PyMem_Malloc to use pymalloc for performance

2016-04-19 Thread Victor Stinner
Ping? Is someone still opposed to my change #26249 "Change
PyMem_Malloc to use pymalloc allocator"? If no, I think that I will
push my change.

My change only changes two lines, so it can be easily reverted before
CPython 3.6 if we detect major issues in third-party extensions. And
maybe it's better to push such change today to get more time to play
with it, than pushing it late in the development of CPython 3.6.

The new PYTHONMALLOC=debug feature allows to quickly and easily check
the usage of the PyMem_Malloc() API, even if Python is compiled in
release mode.

I checked multiple Python extensions written in C. I only found one
bug in numpy and I sent a patch (not merged yet).

victor

2016-03-15 0:19 GMT+01:00 Victor Stinner :
> 2016-02-12 14:31 GMT+01:00 M.-A. Lemburg :
 If your program has bugs, you can use a debug build of Python 3.5 to
 detect misusage of the API.
>>
>> Yes, but people don't necessarily do this, e.g. I have
>> for a very long time ignored debug builds completely
>> and when I started to try them, I found that some of the
>> things I had been doing with e.g. free list implementations
>> did not work in debug builds.
>
> I just added support for debug hooks on Python memory allocators on
> Python compiled in *release* mode. Set the environment variable
> PYTHONMALLOC to debug to try with Python 3.6.
>
> I added a check on PyObject_Malloc() debug hook to ensure that the
> function is called with the GIL held. I opened an issue to add a
> similar check on PyMem_Malloc():
> https://bugs.python.org/issue26563
>
>
>> Yes, but those are part of the stdlib. You'd need to check
>> a few C extensions which are not tested as part of the stdlib,
>> e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement custom
>> types in C since these will often need the memory management
>> APIs).
>>
>> It may also be a good idea to check wrapper generators such
>> as cython, swig, cffi, etc.
>
> I ran the test suite of numpy, lxml, Pillow and cryptography (used cffi).
>
> I found a bug in numpy. numpy calls PyMem_Malloc() without holding the GIL:
> https://github.com/numpy/numpy/pull/7404
>
> Except of this bug, all other tests pass with PyMem_Malloc() using
> pymalloc and all debug checks.
>
> Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Koos Zevenhoven writes:
 > On Tue, Apr 19, 2016 at 2:55 PM, Stephen J. Turnbull  
 > wrote:
 > >
 > > AFAICS bytes return from __fspath__ is just YAGNI.  Show me something
 > > that actually wants it.
 > 
 > It might be,

May I take that as meaning you just jumped to the conclusion that
extending polymorphism is useful on no actual evidence of usefulness?

 > but as long as bytes paths are supported polymorphicly all over the
 > stdlib, we won't get rid of supporting bytes paths. So are you
 > proposing to deprecate bytes paths?

You claim "almost always want str", Ethan claims "bias against bytes."
Sorry, guys, you can't have it both ways.  Either bytes paths are
discouraged (not "deprecated", not yet), or they aren't.

I say, let's not encourage them.  Ie, keep the status quo for bytes,
and make things better for the preferred str.  Yes, that means
discouraging bytes relative to str in this context.  That's a Python 3
principle, one strong enough to justify the huge compatibility break
involved in making str be Unicode.  That compatibility break has been
extremely successful in my personal experience as a sometime Python
teacher and Mailman developer, though the Mercurial developers have a
different POV.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Brett Cannon writes:

 > Now if you can convince me that the use of bytes paths is very
 > minimal

I doubt that I can do that, because all that Python 2 code is
effectively bytes.  To the extent that people are just passing it into
their bytes-domain code and it works for them, they probably "port" to
Python 3 by using bytes for paths.  I just don't think bytes usage per
se matters to the issue of polymorphism of __fspath__.

 > Ah, but you see that doesn't make porting easy. If I have a bunch
 > of path-manipulating code using os.path already and I want to add
 > support for pathlib I can either (a) rewrite all of that
 > path-manipulating code to work using pathlib, or (b) simply call
 > `path = os.fspath(path)` and be done with it.

OK, so what matters here is not "how many people are using bytes".
They can keep using os.path, which is what they probably have already
been using.  What we are worrying about is that

(1) some really attractive producer of pathlib.Paths will be
published, and

(2) people will want to plug that producer into their bytes paths
consumers using os.fspath(path) "and be done with it".

Excuse me, but that doesn't make sense as written.  Path.__fspath__
will return str, in any case.  So these developers have to consume
text to use pathlib, even merely as a consumer of Paths.  No need for
polymorphism here, simply because it won't be used in this instance.

What's left is DirEntry (and perhaps other producers of byte-oriented
objects in os and os.path).  If they're currently using DirEntry,
they're currently accessing .path.  Surely bytes users can continue
doing that, even if we offer str users the advantage of new protocols?

I conclude that there is no real use in having a polymorphic
__fspath__ unless callers of os.fspath can communicate desired return
type to it, and it implicitly coerces to that type.  But then open and
friends *implicitly* consume __fspath__.  So there probably needs to
be a way to communicate the desired type to them in the case where
they receive an __fspath__-bearing object so they can tell os.fspath
what their callers want, no?

Supporting both "pipeline polymorphism" of this kind and implicit
conversion protocols at the same time is quite complicated, I think.

 > [Folks] have convinced me that people do some really crazy stuff
 > with their file systems and that it isn't isolated to just one or
 > two people.  And so it becomes this situation where we need to ask
 > ourselves if we are going to tell them to just deal with it or help
 > them transition.

People who have to deal with really crazy stuff in filesystems are
already manipulating paths as text.  It's not we who need help with
the transition that matters (bytes to text).  We can use os.path or
pathlib, but bytes just don't matter because we're not using them in
path manipulations.

It's people who live in monolingual mono-encoding environments who
will be using bytes successfully, and be resistent to costly changes
that don't make their lives better.  But the bytes vs. text cost is
inherent in using pathlib, so polymorphism doesn't help promote
pathlib.  It might help promote use of os.scandir in bytes-oriented
code, though I don't see that as a huge effect nor more than mildly
desirable.  Is it?

Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > The gist of the motivation for bytes/str polymorphism here is similar to
 > that for restoring __mod__ polymorphism in
 > https://www.python.org/dev/peps/pep-0461/:

I don't think it is, actually.  Filenames off the wire cannot be
relied on to be in the local file system encoding, and that matters.
The semantics of a filename or path requires getting the encodings
matched.  You cannot be encoding-agnostic.

On the other hand, streams of characters are merely a special case of
streams of tokens, and the principles that apply to editing streams of
characters apply to more general tokens, including bytes and XML.  You
*can* be content-agnostic as long as you define semantics in terms of
moving tokens around, and not in terms of their content.

BTW, my opposition to PEP 461 was based on the same mistake with
opposite polarity: I think of bytes as encoded text *first*, and
therefore feared PEP 461 for quite insufficient reason.  Most
applications of PEP 461 won't be for text.

 > This is also why I ended up proposing pushing the complexity down into a
 > documented-but-underscore-prefixed API: folks writing pure Python 3
 > application code *really* shouldn't need to worry about the bytes
 > support

You can't have that with your proposal.  They are going to (at least
in theory) get a new TypeError which they will not be expecting (vs
bytes, which are implicit in the object they have, where previously
they would have got one vs. Path or DirEntry which they were
expecting).  So they will have to learn that much about bytes support.

 > in the protocol, but for operating system level use cases, not having it
 > readily available to 2/3 compatible Python code would be a pain.

Erm, how do you propose to make this protocol available to Python-2-
compatible code?  Pervasively monkey-patch the Python 2 os module?
Even if so, is it our responsibility to worry about that?

BTW, I came to this conclusion thinking about the poster boy for PEP
461, Mercurial.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Chris Angelico
On Wed, Apr 20, 2016 at 1:16 PM, Stephen J. Turnbull  wrote:
> Brett Cannon writes:
>
>  > Now if you can convince me that the use of bytes paths is very
>  > minimal
>
> I doubt that I can do that, because all that Python 2 code is
> effectively bytes.  To the extent that people are just passing it into
> their bytes-domain code and it works for them, they probably "port" to
> Python 3 by using bytes for paths.  I just don't think bytes usage per
> se matters to the issue of polymorphism of __fspath__.
>

I would prefer to see this kind of code ported to Python 3 by using
native strings.

Python 2 code:

import json
with open(".config/obs-studio/basic/scenes/Standard.json") as f:
data = json.load(f)
for scene in data["scene_order"]:
print scene["name"]

Python 3 code:

import json
with open(".config/obs-studio/basic/scenes/Standard.json") as f:
data = json.load(f)
for scene in data["scene_order"]:
print(scene["name"])

The bulk of path string literals in Python programs will be all-ASCII.
Porting to Py3 won't fundamentally change this code, yet suddenly now
it's using Unicode strings. In reality, both versions of this example
are using *text* strings. The Py3 version has text in the source code,
a stream of Unicode codepoints in the runtime, and then (since I ran
this on Linux) encodes that to bytes for the file system. The Py2
version just does that conversion a little earlier: text in the source
code, a stream of eight-bit "texty bytes" in the runtime, and those
same bytes get given to the fs.

There's no reason to slap a b"..." prefix on every path for Py3. There
might be specific situations where you want that, but for the most
part, those paths came from human-readable text anyway, so they should
stay that way.

ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

2016-04-19 Thread Stephen J. Turnbull
Eric Snow writes:
 > On Tue, Apr 19, 2016 at 10:50 AM, Brett Cannon  wrote:
 > > Ah, but you see that doesn't make porting easy.

 > Perhaps I missed previous discussion on the point, but why not support
 > both __fspath__() -> str and __fssyspath__() -> bytes?

That's fine by me, I can live with that although I don't really like
it.  But the proponents of polymorphic __fspath__ think it's
unnecessary.

Why I don't like it: what's going to end up happening is that a
__fspath__- or __fssyspath__-bearing object of unknown provenance is
going to get passed to polymorphic os functions that won't complain,
and a few million cycles later something is going to access
fileobj.path expecting bytes and getting str, and blooey!

Also I just don't see a need for bytes when the original purpose of
this was to support passing pathlib.Path objects to open.  It's also
nice to pass DirEntry objects to open, but it's not obvious to me that
we need to support bytes since only new code can use this feature, and
there's a way to not-support them that doesn't cause any new problems.

It's not that I want bytes to go away[1], it's just that the playing
field will tilt a little more against them in new code.

Footnotes: 
[1]  I wouldn't weep, but I wouldn't laugh, either.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com