[issue27145] long_add and long_sub might return a new int where &small_ints[x] could be returned

Oren Milman Sat, 28 May 2016 08:59:18 -0700

New submission from Oren Milman:

------------ the current state ------------
>>> if is32BitCPython:
...   PyLong_SHIFT = 15
... elif is64BitCPython:
...   PyLong_SHIFT = 30
...
>>> ##### case A #####
>>> a = 2 ** PyLong_SHIFT - 1
>>> b = 2 ** PyLong_SHIFT - 2
>>> a - b
1
>>> a - b is 1
True
>>> a + (-b) is 1
True
>>>
>>> ##### case B #####
>>> a = 2 ** PyLong_SHIFT
>>> b = 2 ** PyLong_SHIFT - 1
>>> a - b
1
>>> a - b is 1
False
>>> a + (-b) is 1
False
>>>
>>> ##### case C #####
>>> a = 2 ** PyLong_SHIFT + 1
>>> b = 2 ** PyLong_SHIFT
>>> a - b
1
>>> a - b is 1
False
>>> a + (-b) is 1
False
>>>


This behavior is caused by the implementation of long_add and long_sub:
    Both long_add and long_sub check whether both a and b are single-digit 
ints, and then do (respectively):
        return PyLong_FromLong(MEDIUM_VALUE(a) + MEDIUM_VALUE(b));
        or
        return PyLong_FromLong(MEDIUM_VALUE(a) - MEDIUM_VALUE(b));
    Otherwise, long_add and long_sub call x_add or x_sub to do a calculation on 
the absolute values of a and b.
    At last, long_add and long_sub negate the result of the calculation, if 
needed, and return the final result. 

Where both a and b are single-digit ints (e.g. case A), the result of the 
calculation is passed to PyLong_FromLong, which uses CHECK_SMALL_INT, and so a 
reference to an element of small_ints is returned where appropriate.
Where a and/or b are not single-digit ints (e.g. cases B and C), x_add or x_sub 
is called. Both x_add and x_sub don't check whether the result is a small int 
(except for a case in x_sub where the result is zero), and so long_add and 
long_sub might return a new int, even where an element of small_ints could be 
reused.

Due to the way CPython uses them, the issue is relevant to x_sub and not to 
x_add, as the calculation the former performs might result in a small int, 
while that of the latter would always result in a multiple-digit int.
(Except for being called by long_add and long_sub, x_add might be called by 
k_mul, but in that case also the calculation would certainly result in a 
multiple-digit int.)


------------ Note ------------
I am not sure whether this is actually an issue that we want to fix.
It seems to me that my proposed changes introduce a slight performance gain (in 
terms of memory, and probably also speed), and a more consistent behavior of 
CPython.
The performance gain would probably be much more relevant if and when greater 
default values are chosen for NSMALLNEGINTS and NSMALLPOSINTS.

Anyway, I guess documenting the issue here, along with a proposal for a fix, is 
better than nothing.
(As far as I know, since the unification of int and long in revision 40626, 
this issue never came up.)


------------ the proposed changes ------------
All of the proposed changes are in Objects/longobject.c:
1. in x_sub:
    To make sure x_sub returns a small int where appropriate, I simply wrapped 
the return value of x_sub with the function maybe_small_long.

2. in long_sub:
    The previous patch alone would create a nasty bug.
    In case both a and b are negative, long_sub calls x_sub, and then negates 
the result in-place by doing 'Py_SIZE(z) = -(Py_SIZE(z));'.
    If x_sub returned a reference to a statically allocated small int (which is 
not zero), long_sub would actually change that statically allocated small int.

    To prevent that, I replaced that in-place negating with a call to 
_PyLong_Negate.

3. in _PyLong_Negate:
    The previous patches, along with http://bugs.python.org/issue27073 (another 
issue I have opened recently), would cause long_sub to call _PyLong_Negate for 
a zero int, in case a and b are the same multiple-digit negative int. 
    Moreover, in the default CPython branch, in case long_mul receives a 
multiple-digit negative int and zero, long_mul would call _PyLong_Negate for a 
zero int.

    To prevent doing 'PyLong_FromLong(-MEDIUM_VALUE(x))' where x is a zero int, 
I have added a check before that (along with a little addition to the function 
comment), so that _PyLong_Negate would do nothing if x is a zero int. 
    It should be noted that MEDIUM_VALUE also checks whether x is a zero int 
(for its own reasons), so thanks to the wisdom of nowadays compilers, the check 
I propose to add shouldn't introduce a performance penalty.
    (Actually, when comparing the assembly of _PyLong_Negate (for win32) of the 
default CPython branch and the patched one, the latter looks simpler.)

    With regard to similar changes made in the past, _PyLong_Negate wasn't 
changed since it replaced the macro NEGATE in revision 84698.

4. in x_sub:
    The previous patches made it safe for x_sub to return a reference to a 
statically allocated small int, and thus made it possible to implement the 
following optimization.
    In case a and b have the same number of digits, x_sub finds the most 
significant digit where a and b differ. Then, if there is no such digit, it 
means a and b are equal, and so x_sub does 'return (PyLongObject 
*)PyLong_FromLong(0);'.
    I propose to add another check after that, to determine whether the least 
significant digit is the only digit where a and b differ. In that case, we can 
do something very similar to what long_add and long_sub do when they realize 
they are dealing with single-digit ints (as mentioned in 'the current state' 
section): 
        return (PyLongObject *)PyLong_FromLong((sdigit)a->ob_digit[0] - 
                                               (sdigit)b->ob_digit[0]);


------------ alternative changes ------------
As an alternative to these 4 changes, I also thought of simply wrapping the 
return value of long_add and long_sub with the function maybe_small_long (i.e. 
change the last line of each of them to 'return (PyObject 
*)maybe_small_long(z);').

This change might be more simple, but I believe it would introduce a 
performance penalty, as it adds checks also to flows of long_add and long_sub 
that would never result in a small int.
Furthermore, this change won't save any allocations of small ints. For example, 
in case C (that was mentioned in 'the current state' section), both in (a - b) 
and in (a + (-b)):
    1. A new int of value 1 would be allocated by x_sub.
    2. In the end of long_add or long_sub, maybe_small_long would realize an 
element of small_ints could be used.
    3. maybe_small_long would use Py_DECREF on the new int of value 1, which 
would cause the deallocation of it.

In contrast, in case C, the 4th change (in the proposed changes section) would 
cause x_sub to realize in advance that the result is going to be a 
single-digit. Consequently, no new int would be futilely allocated in the 
process.

However, in case B (that was mentioned in 'the current state' section), both 
the alternative changes and the proposed changes (and also the default CPython 
branch), won't prevent x_sub from allocating a new int.
    

------------ micro-benchmarking ------------
    I did some micro-benchmarking:
    - subtraction of multiple-digit ints with the same number of digits, while 
the result fits in the small_ints array:
        python -m timeit "[i - (i + 1) for i in range(2 ** 67, 2 ** 67 + 
1000)]" -> The patched CPython is approximately 8% faster.
    - subtraction of multiple-digit ints with the same number of digits, which 
differ only in the least significant digit (while the result doesn't fit in the 
small_ints array):
        python -m timeit "[i - (i + 6) for i in range(2 ** 67, 2 ** 67 + 
1000)]" -> The patched CPython is approximately 3% faster.
    - subtraction of multiple-digit ints with the same number of digits, which 
differ (among others) in the most significant digit:
        python -m timeit "[i * 2 - i for i in range(2 ** 67, 2 ** 67 + 1000)]" 
-> The patched CPython is approximately 1% slower.
    - subtraction of multiple-digit ints with different number of digits:
        python -m timeit "[i ** 2 - i * 3 for i in range(2 ** 67, 2 ** 67 + 
500)]" -> The patched CPython is approximately 2% faster. 
        I expected the patched CPython to be somewhat slower here. Either I 
have missed something, or some compiler optimization magic was used here.


------------ diff ------------
The patches diff is attached.


------------ tests ------------
I built the patched CPython for x86, and played with it a little. Everything 
seemed to work as usual. 
Also, where cases B and C (that were mentioned in 'the current state' section) 
returned 'False', the patched CPython returned 'True', as expected.

In addition, I ran 'python_d.exe -m test' (on my 64-bit Windows 10) with and 
without the patches, and got quite the same output.
the outputs of both runs are attached.

----------
components: Interpreter Core
files: proposedPatches.diff
keywords: patch
messages: 266557
nosy: Oren Milman
priority: normal
severity: normal
status: open
title: long_add and long_sub might return a new int where &small_ints[x] could 
be returned
type: behavior
Added file: http://bugs.python.org/file43041/proposedPatches.diff

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27145>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27145] long_add and long_sub might return a new int where &small_ints[x] could be returned

Reply via email to