date:20140713

[Python-Dev] Another case for frozendict

2014-07-13 Thread Jason R. Coombs

I repeatedly run into situations where a frozendict would be useful, and every 
time I do, I go searching and find the (unfortunately rejected) PEP-416. I'd 
just like to share another case where having a frozendict in the stdlib would 
be useful to me.

I was interacting with a database and had a list of results from 206 queries:

>>> res = [db.cases.remove({'_id': doc['_id']}) for doc in fives]
>>> len(res)
206

I can see that the results are the same for the first two queries.

>>> res[0]
{'n': 1, 'err': None, 'ok': 1.0}
>>> res[1]
{'n': 1, 'err': None, 'ok': 1.0}

So I'd like to test to see if that's the case, so I try to construct a 'set' on 
the results, which in theory would give me a list of unique results:

>>> set(res)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'dict'

I can't do that because dict is unhashable. That's reasonable, and if I had a 
frozen dict, I could easily work around this limitation and accomplish what I 
need.

>>> set(map(frozendict, res))
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'frozendict' is not defined

PEP-416 mentions a MappingProxyType, but that's no help.

>>> res_ex = list(map(types.MappingProxyType, res))
>>> set(res_ex)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: unhashable type: 'mappingproxy'

I can achieve what I need by constructing a set on the 'items' of the dict.

>>> set(tuple(doc.items()) for doc in res)
{(('n', 1), ('err', None), ('ok', 1.0))}

But that syntax would be nicer if the result had the same representation as the 
input (mapping instead of tuple of pairs). A frozendict would have readily 
enabled the desirable behavior.

Although hashability is mentioned in the PEP under constraints, there are many 
use-cases that fall out of the ability to hash a dict, such as the one 
described above, which are not mentioned at all in use-cases for the PEP.

If there's ever any interest in reviving that PEP, I'm in favor of its 
implementation.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread Victor Stinner

The PEP has been rejected, but the MappingProxyType is now public:

$ ./python
Python 3.5.0a0 (default:5af54ed3af02, Jul 12 2014, 03:13:04)
>>> d={1:2}
>>> import types
>>> d = types.MappingProxyType(d)
>>> d
mappingproxy({1: 2})
>>> d[1]
2
>>> d[1] = 3
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'mappingproxy' object does not support item assignment

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread Chris Angelico

On Mon, Jul 14, 2014 at 12:04 AM, Jason R. Coombs  wrote:
> I can achieve what I need by constructing a set on the ‘items’ of the dict.
>
 set(tuple(doc.items()) for doc in res)
>
> {(('n', 1), ('err', None), ('ok', 1.0))}

This is flawed; the tuple-of-tuples depends on iteration order, which
may vary. It should be a frozenset of those tuples, not a tuple. Which
strengthens your case; it's that easy to get it wrong in the absence
of an actual frozendict.

ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Andreas Maier


Am 11.07.2014 22:54, schrieb Ethan Furman:

On 07/11/2014 07:04 AM, Andreas Maier wrote:

Am 09.07.2014 03:48, schrieb Raymond Hettinger:


Personally, I see no need to make the same mistake by removing
the identity-implies-equality rule from the built-in containers.
There's no need to upset the apple cart for nearly zero benefit.


Containers delegate the equal comparison on the container to their
elements; they do not apply identity-based comparison
to their elements. At least that is the externally visible behavior.


If that were true, then [NaN] == [NaN] would be False, and it is not.

Here is the externally visible behavior:

Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
--> NaN = float('nan')
--> NaN == NaN
False
--> [NaN] == [NaN]
True


Ouch, that hurts ;-)

First, the delegation of sequence equality to element equality is not 
something I have come up with during my doc patch. It has always been in

5.9 Comparisons of the Language Reference (copied from Python 3.4):

"Tuples and lists are compared lexicographically using comparison of 
corresponding elements. This means that to compare equal, each element 
must compare equal and the two sequences must be of the same type and 
have the same length."


Second, if not by delegation to equality of its elements, how would the 
equality of sequences defined otherwise?


But your test is definitely worth having a closer look at. I have 
broadened the test somewhat and that brings up further questions. Here 
is the test output, and a discussion of the results (test program 
try_eq.py and its output test_eq.out are attached to issue #12067):


Test #1: Different equal int objects:

  obj1: type=, str=257, id=39305936
  obj2: type=, str=257, id=39306160

  a) obj1 is obj2: False
  b) obj1 == obj2: True
  c) [obj1] == [obj2]: True
  d) {obj1:'v'} == {obj2:'v'}: True
  e) {'k':obj1} == {'k':obj2}: True
  f) obj1 == obj2: True

Discussion:

Case 1.c) can be interpreted that the list delegates its == to the == on 
its elements. It cannot be interpreted to delegate to identity 
comparison. That is consistent with how everyone (I hope ;-) would 
expect int objects to behave, or lists or dicts of them.


The motivation for case f) is explained further down, it has to do with 
caching.


Test #2: Same int object:

  obj1: type=, str=257, id=39305936
  obj2: type=, str=257, id=39305936

  a) obj1 is obj2: True
  b) obj1 == obj2: True
  c) [obj1] == [obj2]: True
  d) {obj1:'v'} == {obj2:'v'}: True
  e) {'k':obj1} == {'k':obj2}: True
  f) obj1 == obj2: True

-> No surprises (I hope).

Test #3: Different equal float objects:

  obj1: type=, str=257.0, id=5734664
  obj2: type=, str=257.0, id=5734640

  a) obj1 is obj2: False
  b) obj1 == obj2: True
  c) [obj1] == [obj2]: True
  d) {obj1:'v'} == {obj2:'v'}: True
  e) {'k':obj1} == {'k':obj2}: True
  f) obj1 == obj2: True

Discussion:

I added this test only to show that float NaN is a special case, and 
that this test for float objects - that are not NaN - behaves like test 
#1 for int objects.


Test #4: Same float object:

  obj1: type=, str=257.0, id=5734664
  obj2: type=, str=257.0, id=5734664

  a) obj1 is obj2: True
  b) obj1 == obj2: True
  c) [obj1] == [obj2]: True
  d) {obj1:'v'} == {obj2:'v'}: True
  e) {'k':obj1} == {'k':obj2}: True
  f) obj1 == obj2: True

-> Same as test #2, hopefully no surprises.

Test #5: Different float NaN objects:

  obj1: type=, str=nan, id=5734784
  obj2: type=, str=nan, id=5734976

  a) obj1 is obj2: False
  b) obj1 == obj2: False
  c) [obj1] == [obj2]: False
  d) {obj1:'v'} == {obj2:'v'}: False
  e) {'k':obj1} == {'k':obj2}: False
  f) obj1 == obj2: False

Discussion:

Here, the list behaves as I would expect under the rule that it 
delegates equality to its elements. Case c) allows that interpretation. 
However, an interpretation based on identity would also be possible.


Test #6: Same float NaN object:

  obj1: type=, str=nan, id=5734784
  obj2: type=, str=nan, id=5734784

  a) obj1 is obj2: True
  b) obj1 == obj2: False
  c) [obj1] == [obj2]: True
  d) {obj1:'v'} == {obj2:'v'}: True
  e) {'k':obj1} == {'k':obj2}: True
  f) obj1 == obj2: False

Discussion (this is Ethan's example):

Case 6.b) shows the special behavior of float NaN that is documented: a 
float NaN object is the same as itself but unequal to itself.


Case 6.c) is the surprising case. It could be interpreted in two ways 
(at least that's what I found):


1) The comparison is based on identity of the float objects. But that is 
inconsistent with test #4. And why would the list special-case NaN 
comparison in such a way that it ends up being inconsistent with the 
special definition of NaN (outside of the list)?


2) The list does not always delegate to element equality, but attempts 
to optimize if the objects are the same (same identity). We will see 
later that that happens. Further, when comparing float

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Steven D'Aprano

On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote:

> Second, if not by delegation to equality of its elements, how would the 
> equality of sequences defined otherwise?

Wow. I'm impressed by the amount of detailed effort you've put into 
investigating this. (Too much detail to absorb, I'm afraid.) But perhaps 
you might have just asked on the [email protected] mailing list, or 
here, where we would have told you the answer:

list __eq__ first checks element identity before going on
to check element equality.

If you can read C, you might like to check the list source code:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c

but if I'm reading it correctly, list.__eq__ conceptually looks 
something like this:

def __eq__(self, other):
if not isinstance(other, list):
return NotImplemented
if len(other) != len(self):
return False
for a, b in zip(self, other):
if not (a is b or a == b):
return False
return True

(The actual code is a bit more complex than that, since there is a 
single function, list_richcompare, which handles all the rich 
comparisons.)

The critical test is PyObject_RichCompareBool here:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/object.c

which explicitly says:

/* Quick result when objects are the same.
   Guarantees that identity implies equality. */

[...]
> I added this test only to show that float NaN is a special case,

NANs are not a special case. List __eq__ treats all object types 
identically (pun intended):

py> class X:
... def __eq__(self, other): return False
...
py> x = X()
py> x == x
False
py> [x] == [X()]
False
py> [x] == [x]
True

[...]
> Case 6.c) is the surprising case. It could be interpreted in two ways 
> (at least that's what I found):
> 
> 1) The comparison is based on identity of the float objects. But that is 
> inconsistent with test #4. And why would the list special-case NaN 
> comparison in such a way that it ends up being inconsistent with the 
> special definition of NaN (outside of the list)?

It doesn't. NANs are not special cased in any way.

This was discussed to death some time ago, both on python-dev and 
python-ideas. If you're interested, you can start here:

https://mail.python.org/pipermail/python-list/2012-October/633992.html

which is in the middle of one of the threads, but at least it gets you 
to the right time period.

> 2) The list does not always delegate to element equality, but attempts 
> to optimize if the objects are the same (same identity).

Right! It's not just lists -- I believe that tuples, dicts and sets 
behave the same way.

> We will see 
> later that that happens. Further, when comparing float NaNs of the same 
> identity, the list implementation forgot to special-case NaNs. Which 
> would be a bug, IMHO.

"Forgot"? I don't think the behaviour of list comparisons is an 
accident.

NAN equality is non-reflexive. Very few other things are the same. It 
would be seriously weird if alist == alist could return False. You'll 
note that the IEEE-754 standard has nothing to say about the behaviour 
of Python lists containing NANs, so we're free to pick whatever 
behaviour makes the most sense for Python, and that is to minimise the 
"Gotcha!" factor.

NANs are a gotcha to anyone who doesn't know IEEE-754, and possibly even 
some who do. I will go to the barricades to fight to keep the 
non-reflexivity of NANs *in isolation*, but I believe that Python has 
made the right decision to treat lists containing NANs the same as 
everything else.

NAN == NAN  # obeys IEEE-754 semantics and returns False

[NAN] == [NAN]  # obeys standard expectation that equality is reflexive

This behaviour is not a bug, it is a feature. As far as I am concerned, 
this only needs documenting. If anyone needs list equality to honour the 
special behaviour of NANs, write a subclass or an equal() function.

-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Chris Angelico

On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano  wrote:
>> We will see
>> later that that happens. Further, when comparing float NaNs of the same
>> identity, the list implementation forgot to special-case NaNs. Which
>> would be a bug, IMHO.
>
> "Forgot"? I don't think the behaviour of list comparisons is an
> accident.

Well, "forgot" is on the basis that the identity check is intended to
be a mere optimization. If that were the case ("don't actually call
__eq__ when you reckon it'll return True"), then yes, failing to
special-case NaN would be a bug. But since it's intended behaviour, as
explained further down, it's not a bug and not the result of
forgetfulness.

ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread Mark Roberts

I find it handy to use named tuple as my database mapping type.  It allows you 
to perform this behavior seamlessly.

-Mark

> On Jul 13, 2014, at 7:04, "Jason R. Coombs"  wrote:
> 
> I repeatedly run into situations where a frozendict would be useful, and 
> every time I do, I go searching and find the (unfortunately rejected) 
> PEP-416. I’d just like to share another case where having a frozendict in the 
> stdlib would be useful to me.
>  
> I was interacting with a database and had a list of results from 206 queries:
>  
> >>> res = [db.cases.remove({'_id': doc['_id']}) for doc in fives]
> >>> len(res)
> 206
>  
> I can see that the results are the same for the first two queries.
>  
> >>> res[0]
> {'n': 1, 'err': None, 'ok': 1.0}
> >>> res[1]
> {'n': 1, 'err': None, 'ok': 1.0}
>  
> So I’d like to test to see if that’s the case, so I try to construct a ‘set’ 
> on the results, which in theory would give me a list of unique results:
>  
> >>> set(res)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'dict'
>  
> I can’t do that because dict is unhashable. That’s reasonable, and if I had a 
> frozen dict, I could easily work around this limitation and accomplish what I 
> need.
>  
> >>> set(map(frozendict, res))
> Traceback (most recent call last):
>   File "", line 1, in 
> NameError: name 'frozendict' is not defined
>  
> PEP-416 mentions a MappingProxyType, but that’s no help.
>  
> >>> res_ex = list(map(types.MappingProxyType, res))
> >>> set(res_ex)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: unhashable type: 'mappingproxy'
>  
> I can achieve what I need by constructing a set on the ‘items’ of the dict.
>  
> >>> set(tuple(doc.items()) for doc in res)
> {(('n', 1), ('err', None), ('ok', 1.0))}
>  
> But that syntax would be nicer if the result had the same representation as 
> the input (mapping instead of tuple of pairs). A frozendict would have 
> readily enabled the desirable behavior.
>  
> Although hashability is mentioned in the PEP under constraints, there are 
> many use-cases that fall out of the ability to hash a dict, such as the one 
> described above, which are not mentioned at all in use-cases for the PEP.
>  
> If there’s ever any interest in reviving that PEP, I’m in favor of its 
> implementation.
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/wizzat%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Nick Coghlan

On 13 July 2014 11:34, Chris Angelico  wrote:
> On Mon, Jul 14, 2014 at 2:23 AM, Steven D'Aprano  wrote:
>>> We will see
>>> later that that happens. Further, when comparing float NaNs of the same
>>> identity, the list implementation forgot to special-case NaNs. Which
>>> would be a bug, IMHO.
>>
>> "Forgot"? I don't think the behaviour of list comparisons is an
>> accident.
>
> Well, "forgot" is on the basis that the identity check is intended to
> be a mere optimization. If that were the case ("don't actually call
> __eq__ when you reckon it'll return True"), then yes, failing to
> special-case NaN would be a bug. But since it's intended behaviour, as
> explained further down, it's not a bug and not the result of
> forgetfulness.

Right, it's not a mere optimisation - it's the only way to get
containers to behave sensibly. Otherwise we'd end up with nonsense
like:

>>> x = float("nan")
>>> x in [x]
False

That currently returns True because of the identity check - it would
return False if we delegated the check to float.__eq__ because the
defined IEEE754 behaviour for NaN's breaks the mathematical definition
of an equivalence class as a transitive, reflexive and commutative
operation. (It breaks it for *good reasons*, but we still need to
figure out a way of dealing with the impedance mismatch between the
definition of floats and the definition of container invariants like
"assert x in [x]")

The current approach means that the lack of reflexivity of NaN's stays
confined to floats and similar types - it doesn't leak out and infect
the behaviour of the container types.

What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Chris Angelico

On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan  wrote:
> What we've never figured out is a good place to *document* it. I
> thought there was an open bug for that, but I can't find it right now.

Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
a parallel explanation of sequence equality.

ChrisA
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Nick Coghlan

On 13 July 2014 13:16, Chris Angelico  wrote:
> On Mon, Jul 14, 2014 at 4:11 AM, Nick Coghlan  wrote:
>> What we've never figured out is a good place to *document* it. I
>> thought there was an open bug for that, but I can't find it right now.
>
> Yeah. The Py3 docs explain why "x in [x]" is True, but I haven't found
> a parallel explanation of sequence equality.

We might need to expand the tables of sequence operations to cover
equality and inequality checks - those are currently missing.

Cheers,
Nick.

>
> ChrisA
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com



-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread dw+python-dev

On Sun, Jul 13, 2014 at 02:04:17PM +, Jason R. Coombs wrote:

> PEP-416 mentions a MappingProxyType, but that’s no help.

Well, it kindof is. By combining MappingProxyType and UserDict the
desired effect can be achieved concisely:

import collections
import types

class frozendict(collections.UserDict):
def __init__(self, d, **kw):
if d:
d = d.copy()
d.update(kw)
else:
d = kw
self.data = types.MappingProxyType(d)

_h = None
def __hash__(self):
if self._h is None:
self._h = sum(map(hash, self.data.items()))
return self._h

def __repr__(self):
return repr(dict(self))


> Although hashability is mentioned in the PEP under constraints, there are many
> use-cases that fall out of the ability to hash a dict, such as the one
> described above, which are not mentioned at all in use-cases for the PEP.

> If there’s ever any interest in reviving that PEP, I’m in favor of its
> implementation.

In its previous form, the PEP seemed more focused on some false
optimization capabilities of a read-only type, rather than as here, the
far more interesting hashability properties. It might warrant a fresh
PEP to more thoroughly investigate this angle.


David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread dw+python-dev

On Sun, Jul 13, 2014 at 06:43:28PM +, [email protected] wrote:

> if d:
> d = d.copy()

To cope with iterables, "d = d.copy()" should have read "d = dict(d)".


David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread Nick Coghlan

On 13 July 2014 13:43,   wrote:
> In its previous form, the PEP seemed more focused on some false
> optimization capabilities of a read-only type, rather than as here, the
> far more interesting hashability properties. It might warrant a fresh
> PEP to more thoroughly investigate this angle.

RIght, the use case would be "frozendict as a simple alternative to a
full class definition", but even less structured than namedtuple in
that the keys may vary as well. That difference means that frozendict
applies more cleanly to semi-structured data manipulated as
dictionaries (think stuff deserialised from JSON) than namedtuple
does.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Marko Rauhamaa

Nick Coghlan :

> Right, it's not a mere optimisation - it's the only way to get
> containers to behave sensibly. Otherwise we'd end up with nonsense
> like:
>
 x = float("nan")
 x in [x]
> False

Why is that nonsense? I mean, why is it any more nonsense than

   >>> x == x
   False

Anyway, personally, I'm perfectly "happy" to live with the choices of
past generations, regardless of whether they were good or not. What you
absolutely don't want to do is "correct" the choices of past generations.


Marko
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Akira Li

Nick Coghlan  writes:
...
> definition of floats and the definition of container invariants like
> "assert x in [x]")
>
> The current approach means that the lack of reflexivity of NaN's stays
> confined to floats and similar types - it doesn't leak out and infect
> the behaviour of the container types.
>
> What we've never figured out is a good place to *document* it. I
> thought there was an open bug for that, but I can't find it right now.

There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873 
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.


--
Akira

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

2014-07-13 Thread Ben Hoyt

>> Very much agreed that this isn't necessary for just readdir/FindNext
>> errors. We've never had this level of detail before -- if listdir()
>> fails half way through (very unlikely) it just bombs with OSError and
>> you get no entries at all.
>>
>> If you really really want this (again very unlikely), you can always
>> use call next() directly and catch OSError around that call.
>
> Agreed - I think the PEP should point this out explicitly, and show that the
> approach it takes offers a lot of flexibility in error handling from "just
> let it fail", to a single try/catch around the whole loop, to try/catch just
> around the operations that might call lstat(), to try/catch around the
> individual iteration steps.

Good point. It'd be good to mention this explicitly in the PEP and
have another example or two of the different levels of errors
handling.

> os.walk remains the higher level API that most code should be using, and
> that has to retain the current listdir based behaviour (any error = ignore
> all entries in that directory) for backwards compatibility reasons.

Yes, definitely.

-Ben
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-13 Thread Ben Hoyt

Hi folks,

Thanks Victor, Nick, Ethan, and others for continued discussion on the
scandir PEP 471 (most recent thread starts at
https://mail.python.org/pipermail/python-dev/2014-July/135377.html).

Just an aside ... I was reminded again recently why scandir() matters:
a scandir user emailed me the other day, saying "I used scandir to
dump the contents of a network dir in under 15 seconds. 13 root dirs,
60,000 files in the structure. This will replace some old VBA code
embedded in a spreadsheet that was taking 15-20 minutes to do the
exact same thing." I asked if he could run scandir's benchmark.py on
his directory tree, and here's what it printed out:

C:\Python34\scandir-master>benchmark.py "\\my\network\directory"
Using fast C version of scandir
Priming the system's cache...
Benchmarking walks on \\my\network\directory, repeat 1/3...
Benchmarking walks on \\my\network\directory, repeat 2/3...
Benchmarking walks on \\my\network\directory, repeat 3/3...
os.walk took 8739.851s, scandir.walk took 129.500s -- 67.5x as fast

That's right -- os.walk() with scandir was almost 70x as fast as the
current version! Admittedly this is a network file system, but that's
still a real and important use case. It really pays not to throw away
information the OS gives you for free. :-)

On the recent python-dev thread, Victor especially made some well
thought out suggestions. It seems to me there's general agreement that
the basic API in PEP 471 is good (with Ethan not a fan at first, but
it seems he's on board after further discussion :-).

That said, I think there's basically one thing remaining to decide:
whether or not to have DirEntry.is_dir() and .is_file() follow
symlinks by default. I think Victor made a pretty good case that:

(a) following links is usually what you want
(b) that's the precedent set by the similar functions os.path.isdir()
and pathlib.Path.is_dir(), so to do otherwise would be confusing
(c) with the non-link-following version, if you wanted to follow links
you'd have to say something like "if (entry.is_symlink() and
os.path.isdir(entry.full_name)) or entry.is_dir()" instead of just "if
entry.is_dir()"
(d) it's error prone to have to do (c), as I found out recently when I
had a bug in my implementation of os.walk() with scandir -- I had a
bug due to getting this exact test wrong

If we go with Victor's link-following .is_dir() and .is_file(), then
we probably need to add his suggestion of a follow_symlinks=False
parameter (defaults to True). Either that or you have to say
"stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
less nice.

As a KISS enthusiast, I admit I'm still somewhat partial to the
DirEntry methods just returning (non-link following) info about the
*directory entry* itself. However, I can definitely see the
error-proneness of that, and the advantages given the points above. So
I guess I'm on the fence.

Given the above arguments for symlink-following is_dir()/is_file()
methods (have I missed any, Victor?), what do others think?

I'd be very keen to come to a consensus on this, so that I can make
some final updates to the PEP and see about getting it accepted and/or
implemented. :-)

-Ben
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-13 Thread Tim Delaney

On 14 July 2014 10:33, Ben Hoyt  wrote:

>

If we go with Victor's link-following .is_dir() and .is_file(), then
> we probably need to add his suggestion of a follow_symlinks=False
> parameter (defaults to True). Either that or you have to say
> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
> less nice.
>

Absolutely agreed that follow_symlinks is the way to go, disagree on the
default value.

> Given the above arguments for symlink-following is_dir()/is_file()
> methods (have I missed any, Victor?), what do others think?
>

I would say whichever way you go, someone will assume the opposite. IMO not
following symlinks by default is safer. If you follow symlinks by default
then everyone has the following issues:

1. Crossing filesystems (including onto network filesystems);

2. Recursive directory structures (symlink to a parent directory);

3. Symlinks to non-existent files/directories;

4. Symlink to an absolutely huge directory somewhere else (very annoying if
you just wanted to do a directory sizer ...).

If follow_symlinks=False by default, only those who opt-in have to deal
with the above.

Tim Delaney
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-13 Thread Nick Coghlan

On 13 Jul 2014 20:54, "Tim Delaney"  wrote:
>
> On 14 July 2014 10:33, Ben Hoyt  wrote:
>>
>>
>>
>> If we go with Victor's link-following .is_dir() and .is_file(), then
>> we probably need to add his suggestion of a follow_symlinks=False
>> parameter (defaults to True). Either that or you have to say
>> "stat.S_ISDIR(entry.lstat().st_mode)" instead, which is a little bit
>> less nice.
>
>
> Absolutely agreed that follow_symlinks is the way to go, disagree on the
default value.
>
>>
>> Given the above arguments for symlink-following is_dir()/is_file()
>> methods (have I missed any, Victor?), what do others think?
>
>
> I would say whichever way you go, someone will assume the opposite. IMO
not following symlinks by default is safer. If you follow symlinks by
default then everyone has the following issues:
>
> 1. Crossing filesystems (including onto network filesystems);
>
> 2. Recursive directory structures (symlink to a parent directory);
>
> 3. Symlinks to non-existent files/directories;
>
> 4. Symlink to an absolutely huge directory somewhere else (very annoying
if you just wanted to do a directory sizer ...).
>
> If follow_symlinks=False by default, only those who opt-in have to deal
with the above.

Or the ever popular symlink to "." (or a directory higher in the tree).

I think os.walk() is a good source of inspiration here: call the flag
"followlink" and default it to False.

Cheers,
Nick.

>
> Tim Delaney
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-13 Thread Tim Delaney

On 14 July 2014 12:17, Nick Coghlan  wrote:
>
> I think os.walk() is a good source of inspiration here: call the flag
> "followlink" and default it to False.
>
Actually, that's "followlinks", and I'd forgotten that os.walk() defaulted
to not follow - definitely behaviour to match IMO :)

Tim Delaney
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Ethan Furman


On 07/13/2014 08:13 AM, Andreas Maier wrote:

Am 11.07.2014 22:54, schrieb Ethan Furman:


Here is the externally visible behavior:

Python 3.5.0a0 (default:34881ee3eec5, Jun 16 2014, 11:31:20)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
--> NaN = float('nan')
--> NaN == NaN
False
--> [NaN] == [NaN]
True


Ouch, that hurts ;-)


Yeah, I've been bitten enough times that now I try to always test code before I 
post.  ;)



Test #8: Same object of class C
(C.__eq__() implemented with equality of x,
 C.__ne__() returning NotImplemented):

   obj1: type=, str=C(256), id=39406504
   obj2: type=, str=C(256), id=39406504

   a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True


This is interesting/weird/odd -- why is __eq__ being called for an 'is' test?

--- test_eq.py 
class TestEqTrue:
def __eq__(self, other):
print('Test.__eq__ returning True')
return True

class TestEqFalse:
def __eq__(self, other):
print('Test.__eq__ returning False')
return False

tet = TestEqTrue()
print(tet is tet)
print(tet in [tet])

tef = TestEqFalse()
print(tef is tef)
print(tef in [tef])
---

When I run this all I get is four Trues, never any messages about being in 
__eq__.

How did you get that result?

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

2014-07-13 Thread Ethan Furman


On 07/13/2014 05:33 PM, Ben Hoyt wrote:


On the recent python-dev thread, Victor especially made some well
thought out suggestions. It seems to me there's general agreement that
the basic API in PEP 471 is good (with Ethan not a fan at first, but
it seems he's on board after further discussion :-).


I would still like to have 'info' and 'onerror' added to the basic API, but I agree that having methods and caching on 
first lookup is good.




That said, I think there's basically one thing remaining to decide:
whether or not to have DirEntry.is_dir() and .is_file() follow
symlinks by default.


We should have a flag for that, and default it to False:

  scandir(path, *, followlinks=False, info=None, onerror=None)

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-13 Thread Ethan Furman


On 07/13/2014 10:33 PM, Andreas Maier wrote:

Am 14.07.2014 04:55, schrieb Ethan Furman:

On 07/13/2014 08:13 AM, Andreas Maier wrote:

Test #8: Same object of class C
(C.__eq__() implemented with equality of x,
 C.__ne__() returning NotImplemented):

   obj1: type=, str=C(256), id=39406504
   obj2: type=, str=C(256), id=39406504

   a) obj1 is obj2: True
C.__eq__(): self=39406504, other=39406504, returning True


This is interesting/weird/odd -- why is __eq__ being called for an 'is'
test?


The debug messages are printed before the result is printed. So this is the 
debug message for the next case, 8.b).


Ah, whew!  That's a relief.


Sorry for not explaining it.


Had I been reading more closely I would (hopefully) have noticed that, but I 
was headed out the door at the time.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Another case for frozendict

Re: [Python-Dev] Another case for frozendict

Re: [Python-Dev] Another case for frozendict

[Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] Another case for frozendict

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] Another case for frozendict

Re: [Python-Dev] Another case for frozendict

Re: [Python-Dev] Another case for frozendict

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] Updates to PEP 471, the os.scandir() proposal

[Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

Re: [Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Re: [Python-Dev] == on object tests identity in 3.x - list delegation to members?

23 matches

Site Navigation

Mail list logo

Footer information