[ python-Bugs-1770416 ] Decimal.int overflows for large values

SourceForge.net Sun, 12 Aug 2007 12:43:46 -0700

Bugs item #1770416, was opened at 2007-08-08 20:43
Message generated for change (Comment added) made by marketdickinson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1770416&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jason G (aryx)
Assigned to: Facundo Batista (facundobatista)
Summary: Decimal.__int__ overflows for large values

Initial Comment:
This also affects Decimal.__hash__, since it [indirectly] calls Decimal.__int__.

>>> from decimal import Decimal as D
>>> e = D("1e1234567890987654321")
>>> int(e)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/decimal.py", line 1501, in __int__
    s = ''.join(map(str, self._int)) + '0'*self._exp
OverflowError: cannot fit 'long' into an index-sized integer
>>> e = D("1e1234567890")
>>> int(e)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/decimal.py", line 1501, in __int__
    s = ''.join(map(str, self._int)) + '0'*self._exp
MemoryError

Also, for values that do work this is incredibly slow if they are still fairly 
large.

----------------------------------------------------------------------

Comment By: Mark Dickinson (marketdickinson)
Date: 2007-08-12 19:43

Message:
Logged In: YES 
user_id=703403
Originator: NO

Mark Dickinson (marketdickinson) stupidly claimed that:

hash(n) == hash(n % (2**32-1))

It's not true.  Sorry for the noise.




----------------------------------------------------------------------

Comment By: Mark Dickinson (marketdickinson)
Date: 2007-08-12 17:57

Message:
Logged In: YES 
user_id=703403
Originator: NO

Doesn't using hash(D.as_tuple()) break the principle that if two objects
compare equal then they should have equal hash?

An alternative to converting to long before hashing is to use the fact
that for the current hash implementation for long we have
hash(n) == hash(n % (2**32-1)) (except when n is a multiple of 2**32-1). 
For a Decimal d that's integral, one should be
able to compute d % (2**32-1) very quickly: if d = c*10**e then just use
(c * pow(10, e, 2**32-1)) % (2**32-1), which should be acceptably fast even
when d = 123456789E999999999999999.

The only tricky bit is that on a 64-bit system all those 2**32-1 need to
be replaced by 2**64-1.  Though now I come to think of it,
since 2**32-1 is a factor of 2**64-1 it would be enough just to do
everything modulo 2**64-1 even on a 32-bit system.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-08-09 17:37

Message:
Logged In: YES 
user_id=849994
Originator: NO

Assigning to Facundo, he's actively working on decimal ATM.

----------------------------------------------------------------------

Comment By: ajaksu (ajaksu2)
Date: 2007-08-09 17:21

Message:
Logged In: YES 
user_id=1200609
Originator: NO

I see. Inheriting from Decimal and overloading __hash__ is a way to solve
your problem, but it's IMHO a shallow bug and worth reporting.

I just tried hash(D.as_tuple()) and it is blazing fast. I think that
unless the official line is "don't touch decimal.py until X", this change
to hashing would be useful and (AFAICT) harmless enough to fit in e.g.
2.5.2. To avoid incompatibilities, __hash__ could check for Overflow and
only use .as_tuple for values higher than the previous maximum (keeping,
unfortunately, __hash__ slow for values below).

Could the current status of Decimal be made a bit more clear? Are bug
reports/patches welcome? Is bugging  Facundo or RHettinger welcome? :)

If getting __int__ a bit faster and able to convert sans huge strings is
desired,  I've updated that old function (see below) and AFAIK it could be
used to replace Lib/decimal.py/Decimal.[__int__,__long__]. It gets about
ten times faster on best cases and is about as slow on worst cases (could
be much worse if "long(rint_part + rdec_part)/exponent" is a STUPID thing
to do, but seems easy to avoid). As the original __int__ optimizes
str(Decimal._int) and doesn't split/check for substrings, using the same
path should speed this up more. I can run the tests and benchmark it (next
month...) if there's interest.

def dec2long(number):
    """ Convert decimal.Decimal to long (abridged, non-checking
version)"""
    decimal_string = str(number)
    if "e" in decimal_string:
        radix, exponent = decimal_string.split("e")
    elif "E" in decimal_string:
        radix, exponent = decimal_string.split("E")
    else:
        radix, exponent = (decimal_string, 0)
    if exponent:
        exponent = int(exponent)
        if "." in radix:
            rint, rdec = radix.split(".")
            radix_decimal_part_len = long(len(rdec))
            if radix_decimal_part_len <= exponent:
                radix_as_long = long(rint + rdec)
                corrected_exponent = exponent - radix_decimal_part_len
                result =  radix_as_long * 10L** corrected_exponent
            else:
                result = long(rint + rdec) / exponent
        else:
            radix_as_long = long(radix)
            result = radix_as_long * 10L**exponent
    else:
        if "." in radix:
            radix_integer_part = long(radix.split(".")[0])
        else:
            radix_integer_part = long(radix)
        result = radix_integer_part
    return result

As a comparison, here's __int__ (abridged) from decimal.py:
    def __int__(number):
        """Converts self to an int, truncating if necessary."""
        if number._exp >= 0:
            s = ''.join(map(str, number._int)) + '0'*number._exp
        else:
            s = ''.join(map(str, number._int))[:number._exp]
        if s == '':
            s = '0'
        sign = '-'*self._sign
        return int(sign + s)

----------------------------------------------------------------------

Comment By: Jason G (aryx)
Date: 2007-08-09 15:09

Message:
Logged In: YES 
user_id=1289703
Originator: YES

Hey Daniel,

The bigger issue for us is mostly the fact that Decimal.__hash__ us
calling Decimal.__int__ and not because we want an integer/long version of
a very large Decimal. We do not actually cast the decimal into an int/long
explicitly. I wouldn't have any issues if Decimal.__int__ remained as it
is, but I think it would be a good idea for Decimal.__hash__ to do
something differently. Probably something that is fast and simple, such as
hash( self.as_tuple() ), self being a Decimal of course.

Our project is a CAS and we use Decimal for our real number class. I
happened to run into this issue when I was implementing approximation of
log(x) for extremely large/small values of x. I just started keyboard
bashing numbers and behold, Decimal crashed on me :)

----------------------------------------------------------------------

Comment By: ajaksu (ajaksu2)
Date: 2007-08-09 08:09

Message:
Logged In: YES 
user_id=1200609
Originator: NO

Hi Jason,
The OverflowError is related to "index-sized ints" as in "ints that are
valid indexes for sequences", try:
>>> e = "0" * 1234567890

So it seems that this error is avoiding the creation of a string of length
1234567890, which is a good thing (sorta) :)

Once I tried to implement a dec2long function that was based on numbers
instead of strings, see if it helps (it's VERY slow and naive, but IIRC it
was a bit faster than the original version and correct):
http://groups.google.com/group/comp.lang.python/msg/aba7264ab38eb25e

Now, do you really need all that precision for such huge numbers? I know I
didn't ;)
Daniel

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1770416&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-1770416 ] Decimal.__int__ overflows for large values

Reply via email to

[ python-Bugs-1770416 ] Decimal.int overflows for large values