[issue15573] Support unknown formats in memoryview comparisons

Nick Coghlan Sat, 11 Aug 2012 11:58:14 -0700

Nick Coghlan added the comment:

OK, I think I finally understand what Martin is getting at from a semantic 
point of view, and I think I can better explain the background of the issue and 
why Stefan's proposed solution is both necessary and correct.


The ideal definition of equivalence for memory view objects would actually be:

memoryview(x) == memoryview(y)

if (and only if)

memoryview(x).tolist() == memoryview(y).tolist()

Now, in practice, this approach cannot be implemented, because there are too 
many format definitions (whether valid or invalid) that memoryview doesn't 
understand (and perhaps will never understand) and because it would be 
completely infeasible on large arrays with complex format definitions.

Thus, we are forced to accept a *constraint* on memoryview's definition of 
equality: individual values are always compared via raw memory comparison, thus 
values stored using different *sizes* or *layouts* in memory will always 
compare as unequal, even if they would compare as equal in Python

This is an *acceptable* constraint as, in practice, you don't perform mixed 
format arithmetic and it's not a problem if there's no automatic coercion 
between sizes and layouts.

The Python 3.2 memoryview effectively uses memcmp() directly treating 
everything as a 1D array of bytes data, completely ignoring both shape *and* 
format data. Thus:

>>> ab = array('b', [1, 2, 3])
>>> ai = array('i', [1, 2, 3])
>>> aL = array('L', [1, 2, 3])
>>> ab == ai
True
>>> ab == ai == aL
True
>>> memoryview(ab) == memoryview(ai)
False
>>> memoryview(ab) == memoryview(aL)
False
>>> memoryview(ai) == memoryview(aL)
False

This approach leads to some major false positives, such as a floating point 
value comparing equal to an integer that happens to share the same binary 
representation:

>>> af = array('f', [1.1])
>>> ai = array('i', [1066192077])
>>> af == ai
False
>>> memoryview(af) == memoryview(ai)
True

The changes in 3.3 are aimed primarily at *eliminating those false positives* 
by taking into account the shape of the array and the format of the contained 
values. It is *not* about changing the fundamental constraint that memoryview 
operates at the level of raw memory, rather than Python objects, and thus cares 
about memory layout details that are irrelevant after passing through the 
Python abstraction layer.

This contrasts with the more limited scope of the array.array module, which 
*does* take into account the Python level abstractions. Thus, there will always 
be a discrepancy between the two definitions of equality, as memoryview cares 
about memory layout details, where array.array does not.

The problem at the moment is that Python 3.3 currently has *spurious* false 
negatives that aren't caused by that fundamental constraint that comparisons 
must occur based directly on memory contents. Instead, they're being caused by 
memoryview returning False for any equality comparison for a format it doesn't 
understand. That's unacceptable, and is what Stefan's patch is intended to fix.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15573>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15573] Support unknown formats in memoryview comparisons

Reply via email to