Re: Precision Tail-off?

2023-02-17 Thread Stephen Tucker
Thanks, one and all, for your reponses.

This is a hugely controversial claim, I know, but I would consider this
behaviour to be a serious deficiency in the IEEE standard.

Consider an integer N consisting of a finitely-long string of digits in
base 10.

Consider the infinitely-precise cube root of N (yes I know that it could
never be computed unless N is the cube of an integer, but this is a
mathematical argument, not a computational one), also in base 10. Let's
call it RootN.

Now consider appending three zeroes to the right-hand end of N (let's call
it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).

The *only *difference between RootN and RootNZZZ is that the decimal point
in RootNZZZ is one place further to the right than the decimal point in
RootN.

None of the digits in RootNZZZ's string should be different from the
corresponding digits in RootN.

I rest my case.

Perhaps this observation should be brought to the attention of the IEEE. I
would like to know their response to it.

Stephen Tucker.


On Thu, Feb 16, 2023 at 6:49 PM Peter Pearson 
wrote:

> On Tue, 14 Feb 2023 11:17:20 +, Oscar Benjamin wrote:
> > On Tue, 14 Feb 2023 at 07:12, Stephen Tucker 
> wrote:
> [snip]
> >> I have just produced the following log in IDLE (admittedly, in Python
> >> 2.7.10 and, yes I know that it has been superseded).
> >>
> >> It appears to show a precision tail-off as the supplied float gets
> bigger.
> [snip]
> >>
> >> For your information, the first 20 significant figures of the cube root
> in
> >> question are:
> >>49793385921817447440
> >>
> >> Stephen Tucker.
> >> --
> >> >>> 123.456789 ** (1.0 / 3.0)
> >> 4.979338592181744
> >> >>> 1234567890. ** (1.0 / 3.0)
> >> 49793385921817.36
> >
> > You need to be aware that 1.0/3.0 is a float that is not exactly equal
> > to 1/3 ...
> [snip]
> > SymPy again:
> >
> > In [37]: a, x = symbols('a, x')
> >
> > In [38]: print(series(a**x, x, Rational(1, 3), 2))
> > a**(1/3) + a**(1/3)*(x - 1/3)*log(a) + O((x - 1/3)**2, (x, 1/3))
> >
> > You can see that the leading relative error term from x being not
> > quite equal to 1/3 is proportional to the log of the base. You should
> > expect this difference to grow approximately linearly as you keep
> > adding more zeros in the base.
>
> Marvelous.  Thank you.
>
>
> --
> To email me, substitute nowhere->runbox, invalid->com.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Stephen Tucker
As a follow-up to my previous message, I have just produced the following
log on IDLE, for your information:
--
>>> math.e ** (math.log
(12345678900) / 3)
4.979338592181741e+16
>>> 10 ** (math.log10 (12345678900)
/ 3)
4.979338592181736e+16
>>> 12345678900 ** (1.0 / 3.0)
4.979338592181734e+16
>>> 123456789e42 ** (1.0 / 3.0)
4.979338592181734e+16
--

Stephen Tucker.


On Fri, Feb 17, 2023 at 10:27 AM Stephen Tucker 
wrote:

> Thanks, one and all, for your reponses.
>
> This is a hugely controversial claim, I know, but I would consider this
> behaviour to be a serious deficiency in the IEEE standard.
>
> Consider an integer N consisting of a finitely-long string of digits in
> base 10.
>
> Consider the infinitely-precise cube root of N (yes I know that it could
> never be computed unless N is the cube of an integer, but this is a
> mathematical argument, not a computational one), also in base 10. Let's
> call it RootN.
>
> Now consider appending three zeroes to the right-hand end of N (let's call
> it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).
>
> The *only *difference between RootN and RootNZZZ is that the decimal
> point in RootNZZZ is one place further to the right than the decimal point
> in RootN.
>
> None of the digits in RootNZZZ's string should be different from the
> corresponding digits in RootN.
>
> I rest my case.
>
> Perhaps this observation should be brought to the attention of the IEEE. I
> would like to know their response to it.
>
> Stephen Tucker.
>
>
> On Thu, Feb 16, 2023 at 6:49 PM Peter Pearson 
> wrote:
>
>> On Tue, 14 Feb 2023 11:17:20 +, Oscar Benjamin wrote:
>> > On Tue, 14 Feb 2023 at 07:12, Stephen Tucker 
>> wrote:
>> [snip]
>> >> I have just produced the following log in IDLE (admittedly, in Python
>> >> 2.7.10 and, yes I know that it has been superseded).
>> >>
>> >> It appears to show a precision tail-off as the supplied float gets
>> bigger.
>> [snip]
>> >>
>> >> For your information, the first 20 significant figures of the cube
>> root in
>> >> question are:
>> >>49793385921817447440
>> >>
>> >> Stephen Tucker.
>> >> --
>> >> >>> 123.456789 ** (1.0 / 3.0)
>> >> 4.979338592181744
>> >> >>> 1234567890. ** (1.0 / 3.0)
>> >> 49793385921817.36
>> >
>> > You need to be aware that 1.0/3.0 is a float that is not exactly equal
>> > to 1/3 ...
>> [snip]
>> > SymPy again:
>> >
>> > In [37]: a, x = symbols('a, x')
>> >
>> > In [38]: print(series(a**x, x, Rational(1, 3), 2))
>> > a**(1/3) + a**(1/3)*(x - 1/3)*log(a) + O((x - 1/3)**2, (x, 1/3))
>> >
>> > You can see that the leading relative error term from x being not
>> > quite equal to 1/3 is proportional to the log of the base. You should
>> > expect this difference to grow approximately linearly as you keep
>> > adding more zeros in the base.
>>
>> Marvelous.  Thank you.
>>
>>
>> --
>> To email me, substitute nowhere->runbox, invalid->com.
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Thomas Passin

On 2/17/2023 5:27 AM, Stephen Tucker wrote:

Thanks, one and all, for your reponses.

This is a hugely controversial claim, I know, but I would consider this
behaviour to be a serious deficiency in the IEEE standard.

Consider an integer N consisting of a finitely-long string of digits in
base 10.


What you are not considering is that the IEEE standard is about trying 
to achieve a balance between resource use (memory and registers), 
precision, speed of computation, reliability (consistent and stable 
results), and compatibility.  So there have to be many tradeoffs.  One 
of them is the use of binary representation.  It has never been about 
achieving ideal mathematical perfection for some set of special cases.


Want a different set of tradeoffs?  Fine, go for it.  Python has Decimal 
and rational libraries among others.  They run more slowly than IEEE, 
but maybe that's a good tradeoff for you.  Use a symbolic math library. 
Trap special cases of interest to you and calculate them differently. 
Roll your own.  Trouble is, you have to know one heck of a lot to roll 
your own, and it may take decades of debugging to get it right.  Even 
then it won't have hardware assistance like IEEE floating point usually has.



Consider the infinitely-precise cube root of N (yes I know that it could
never be computed unless N is the cube of an integer, but this is a
mathematical argument, not a computational one), also in base 10. Let's
call it RootN.

Now consider appending three zeroes to the right-hand end of N (let's call
it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).

The *only *difference between RootN and RootNZZZ is that the decimal point
in RootNZZZ is one place further to the right than the decimal point in
RootN.

None of the digits in RootNZZZ's string should be different from the
corresponding digits in RootN.

I rest my case.

Perhaps this observation should be brought to the attention of the IEEE. I
would like to know their response to it.

Stephen Tucker.


On Thu, Feb 16, 2023 at 6:49 PM Peter Pearson 
wrote:


On Tue, 14 Feb 2023 11:17:20 +, Oscar Benjamin wrote:

On Tue, 14 Feb 2023 at 07:12, Stephen Tucker 

wrote:
[snip]

I have just produced the following log in IDLE (admittedly, in Python
2.7.10 and, yes I know that it has been superseded).

It appears to show a precision tail-off as the supplied float gets

bigger.
[snip]


For your information, the first 20 significant figures of the cube root

in

question are:
49793385921817447440

Stephen Tucker.
--

123.456789 ** (1.0 / 3.0)

4.979338592181744

1234567890. ** (1.0 / 3.0)

49793385921817.36


You need to be aware that 1.0/3.0 is a float that is not exactly equal
to 1/3 ...

[snip]

SymPy again:

In [37]: a, x = symbols('a, x')

In [38]: print(series(a**x, x, Rational(1, 3), 2))
a**(1/3) + a**(1/3)*(x - 1/3)*log(a) + O((x - 1/3)**2, (x, 1/3))

You can see that the leading relative error term from x being not
quite equal to 1/3 is proportional to the log of the base. You should
expect this difference to grow approximately linearly as you keep
adding more zeros in the base.


Marvelous.  Thank you.


--
To email me, substitute nowhere->runbox, invalid->com.
--
https://mail.python.org/mailman/listinfo/python-list



--
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Weatherby,Gerard
IEEE did not define a standard for floating point arithmetics. They designed 
multiple standards, including a decimal float point one.  Although decimal 
floating point (DFP) hardware used to be manufactured, I couldn’t find any 
current manufacturers. There was a company that seemed to be active until a few 
years ago, but they seem to have gone dark: https://twitter.com/SilMinds



From: Python-list  on 
behalf of Thomas Passin 
Date: Friday, February 17, 2023 at 9:02 AM
To: python-list@python.org 
Subject: Re: Precision Tail-off?
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

On 2/17/2023 5:27 AM, Stephen Tucker wrote:
> Thanks, one and all, for your reponses.
>
> This is a hugely controversial claim, I know, but I would consider this
> behaviour to be a serious deficiency in the IEEE standard.
>
> Consider an integer N consisting of a finitely-long string of digits in
> base 10.

What you are not considering is that the IEEE standard is about trying
to achieve a balance between resource use (memory and registers),
precision, speed of computation, reliability (consistent and stable
results), and compatibility.  So there have to be many tradeoffs.  One
of them is the use of binary representation.  It has never been about
achieving ideal mathematical perfection for some set of special cases.

Want a different set of tradeoffs?  Fine, go for it.  Python has Decimal
and rational libraries among others.  They run more slowly than IEEE,
but maybe that's a good tradeoff for you.  Use a symbolic math library.
Trap special cases of interest to you and calculate them differently.
Roll your own.  Trouble is, you have to know one heck of a lot to roll
your own, and it may take decades of debugging to get it right.  Even
then it won't have hardware assistance like IEEE floating point usually has.

> Consider the infinitely-precise cube root of N (yes I know that it could
> never be computed unless N is the cube of an integer, but this is a
> mathematical argument, not a computational one), also in base 10. Let's
> call it RootN.
>
> Now consider appending three zeroes to the right-hand end of N (let's call
> it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).
>
> The *only *difference between RootN and RootNZZZ is that the decimal point
> in RootNZZZ is one place further to the right than the decimal point in
> RootN.
>
> None of the digits in RootNZZZ's string should be different from the
> corresponding digits in RootN.
>
> I rest my case.
>
> Perhaps this observation should be brought to the attention of the IEEE. I
> would like to know their response to it.
>
> Stephen Tucker.
>
>
> On Thu, Feb 16, 2023 at 6:49 PM Peter Pearson 
> wrote:
>
>> On Tue, 14 Feb 2023 11:17:20 +, Oscar Benjamin wrote:
>>> On Tue, 14 Feb 2023 at 07:12, Stephen Tucker 
>> wrote:
>> [snip]
 I have just produced the following log in IDLE (admittedly, in Python
 2.7.10 and, yes I know that it has been superseded).

 It appears to show a precision tail-off as the supplied float gets
>> bigger.
>> [snip]

 For your information, the first 20 significant figures of the cube root
>> in
 question are:
 49793385921817447440

 Stephen Tucker.
 --
>>> 123.456789 ** (1.0 / 3.0)
 4.979338592181744
>>> 1234567890. ** (1.0 / 3.0)
 49793385921817.36
>>>
>>> You need to be aware that 1.0/3.0 is a float that is not exactly equal
>>> to 1/3 ...
>> [snip]
>>> SymPy again:
>>>
>>> In [37]: a, x = symbols('a, x')
>>>
>>> In [38]: print(series(a**x, x, Rational(1, 3), 2))
>>> a**(1/3) + a**(1/3)*(x - 1/3)*log(a) + O((x - 1/3)**2, (x, 1/3))
>>>
>>> You can see that the leading relative error term from x being not
>>> quite equal to 1/3 is proportional to the log of the base. You should
>>> expect this difference to grow approximately linearly as you keep
>>> adding more zeros in the base.
>>
>> Marvelous.  Thank you.
>>
>>
>> --
>> To email me, substitute nowhere->runbox, invalid->com.
>> --
>> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jqgolDJWMiHsy0l-fRvM6Flcs478R5LIidNh2fAfa3kuPrtqTm0FC6uQmnUuyWLNypQZd3PkzzGyRzZlkbA$
>>

--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jqgolDJWMiHsy0l-fRvM6Flcs478R5LIidNh2fAfa3kuPrtqTm0FC6uQmnUuyWLNypQZd3PkzzGyRzZlkbA$
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Precision Tail-off?

2023-02-17 Thread avi.e.gross
Stephen,

What response do you expect from whatever people in the IEEE you want?

The specific IEEE standards were designed and agreed upon by groups working
in caveman times when the memory and CPU time were not so plentiful. The
design of many types, including floating point, had to work decently if not
perfectly so you stored data in ways the hardware of the time could process
it efficiently.

Note all kinds of related issues about what happens if you want an integer
larger than fits into 16 bits or 32 bits or even 64 bits. A Python integer
was designed to be effectively unlimited and uses as much storage as needed.
It can also get ever slower when doing things like looking for gigantic
primes. But it does not have overflow problems.

So could you design a floating point data type with similar features? It
would be some complex data structure that keeps track of the number of
bit/bytes/megabytes currently being used to store the mantissa or exponent
parts and then have some data structure that holds all the bits needed. When
doing any arithmetic like addition or division or more complex things, it
would need to compare the two objects being combined and calculate how to
perhaps expand/convert one to match the other and then do umpteen steps to
generate the result in as many pieces/steps as needed and create a data
structure that holds the result, optionally trimming off terminal parts not
needed or wanted. Then you would need all relevant functions that accept
regular floating point to handle these numbers and generate these numbers.

Can that be done well? Well, sure, but not necessarily WELL. Some would
point you to the Decimal type. It might take a somewhat different tack on
how to do this. But everything comes with a cost.

Perhaps the response from the IEEE would be that what they published was
meant for some purposes but not yours. It may be that a group needs to
formulate a new standard but leave the old ones in place for people willing
to use them as their needs are more modest. 

As an analogy, consider the lowly char that stored a single character in a
byte. II mean good old ASCII but also EBCDIC and the ISO family like ISO
8859-1 and so on. Those standards focused in on the needs of just a few
languages and if you wanted to write something in a mix of languages, it
could be a headache as I have had time I had to shift within one document to
say ISO 8859-8 to include some Hebrew, and ISO 8859-3 for Esperanto and so
on while ISO8859-1 was fine for English, French, German, Spanish and many
others. For some purposes, I had to use encodings like shift JIS to do
Japanese as many Asian languages were outside what ISO was doing.

The solutions since then vary but tend to allow or require multiple bytes
per character. But they retain limits and if we ever enter a Star Trek
Universe with infinite diversity and more languages and encodings, we might
need to again enlarge our viewpoint and perhaps be even more wasteful of our
computing resources to accommodate them all!

Standards are often not made to solve ALL possible problems but to make
clear what is supported and what is not required. Mathematical arguments can
be helpful but practical considerations and the limited time available (as
these darn things can take YEARS to be agreed on) are often dominant.
Frankly, by the tie many standards, such as for a programming language, are
finalized, the reality in the field has often changed. The language may
already have been supplanted largely by others for new work, or souped up
with not-yet-standard features.

I am not against striving for ever better standards and realities. But I do
think a better way to approach this is not to reproach what was done but ask
if we can focus on the near-future and make it better.

Arguably, there are now multiple features out there such as Decimal and they
may be quite different. That often happens without a standard.  But if you
now want everyone to get together and define a new standard that may break
some implementations, ...

As I see it, many computer courses teach the realities as well as the
mathematical fantasies that break down in the real world. One of those that
tend to be stressed is that floating point is not exact and that comparison
operators need to be used with caution. Often the suggestion is to subtract
one number from another and check if the result is fairly close to zero as
in the absolute value is less than an IEEE standard number where the last
few bits are ones. For more complex calculations where the errors can
accumulate, you may need to choose a small number with more such bits near
the end.

Extended precision arithmetic is perhaps cheaper now and can be done for a
reasonable number of digits. It probably is not realistic to do most such
calculations for billions of digits, albeit some of the calculations for the
first googolplex digits of pi might indeed need such methods, as soon as we
fin a way to keep that many digits in memory give the t

Re: Precision Tail-off?

2023-02-17 Thread Peter Pearson
On Fri, 17 Feb 2023 10:27:08, Stephen Tucker wrote:[Head-posting undone.]
> On Thu, Feb 16, 2023 at 6:49 PM Peter Pearson 
> wrote:
>> On Tue, 14 Feb 2023 11:17:20 +, Oscar Benjamin wrote:
>> > On Tue, 14 Feb 2023 at 07:12, Stephen Tucker 
>> wrote:
>> [snip]
>> >> I have just produced the following log in IDLE (admittedly, in Python
>> >> 2.7.10 and, yes I know that it has been superseded).
>> >>
>> >> It appears to show a precision tail-off as the supplied float gets
>> bigger.
>> [snip]
>> >>
>> >> For your information, the first 20 significant figures of the cube root
>> in
>> >> question are:
>> >>49793385921817447440
>> >>
>> >> Stephen Tucker.
>> >> --
>> >> >>> 123.456789 ** (1.0 / 3.0)
>> >> 4.979338592181744
>> >> >>> 1234567890. ** (1.0 / 3.0)
>> >> 49793385921817.36
>> >
>> > You need to be aware that 1.0/3.0 is a float that is not exactly equal
>> > to 1/3 ...
>> [snip]
>> > SymPy again:
>> >
>> > In [37]: a, x = symbols('a, x')
>> >
>> > In [38]: print(series(a**x, x, Rational(1, 3), 2))
>> > a**(1/3) + a**(1/3)*(x - 1/3)*log(a) + O((x - 1/3)**2, (x, 1/3))
>> >
>> > You can see that the leading relative error term from x being not
>> > quite equal to 1/3 is proportional to the log of the base. You should
>> > expect this difference to grow approximately linearly as you keep
>> > adding more zeros in the base.
>>
>> Marvelous.  Thank you.
[snip]
> Now consider appending three zeroes to the right-hand end of N (let's call
> it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).
>
> The *only *difference between RootN and RootNZZZ is that the decimal point
> in RootNZZZ is one place further to the right than the decimal point in
> RootN.
>
> None of the digits in RootNZZZ's string should be different from the
> corresponding digits in RootN.
>
> I rest my case.
[snip]


I believe the pivotal point of Oscar Benjamin's explanation is
that within the constraints of limited-precision binary floating-point
numbers, the exponent of 1/3 cannot be represented precisely, and
is in practice represented by something slightly smaller than 1/3;
and accordingly, when you multiply your argument by 1000, its
not-quit-cube-root gets multiplied by something slightly smaller
than 10, which is why the number of figures matching the "right"
answer gets steadily smaller.

Put slightly differently, the crux of the problem lies not in the
complicated process of exponentiation, but simply in the failure
to represent 1/3 exactly.  The fact that the exponent is slightly
less than 1/3 means that you would observe the steady loss of
agreement that you report, even if the exponentiation process
were perfect.

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Michael Torrie
On 2/17/23 03:27, Stephen Tucker wrote:
> Thanks, one and all, for your reponses.
> 
> This is a hugely controversial claim, I know, but I would consider this
> behaviour to be a serious deficiency in the IEEE standard.

No matter how you do it, there are always tradeoffs and inaccuracies
moving from real numbers in base 10 to base 2.  That's just the nature
of the math.  Any binary floating point representation is going to have
problems.  There are techniques for mitigating this:
https://en.wikipedia.org/wiki/Floating-point_error_mitigation
It's interesting to note that the article points out that floating point
error was first talked about in the 1930s.  So no matter what binary
scheme you choose there will be error. That's just the nature of
converting a real from one base to another.

Also we weren't clear on this, but the IEEE standard is not just
implemented in software. It's the way your CPU represents floating point
numbers in silicon.  And in your GPUs (where speed is preferred to
precision).  So it's not like Python could just arbitrarily do something
different unless you were willing to pay a huge penalty for speed.  For
example the decimal module which is arbitrary precision, but quite slow.

Have you tried the numpy cbrt() function?  It is probably going to be
more accurate than using power to 0..

> Perhaps this observation should be brought to the attention of the IEEE. I
> would like to know their response to it.
Rest assured the IEEE committee that formalized the format decades ago
knew all about the limitations and trade-offs.  Over the years CPUs have
increased in capacity and now we can use 128-bit floating point numbers
which mitigate some of the accuracy problems by simply having more
binary digits. But the fact remains that some rational numbers in
decimal are irrational in binary, so arbitrary decimal precision using
floating point is not possible.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Richard Damon

On 2/17/23 5:27 AM, Stephen Tucker wrote:

Thanks, one and all, for your reponses.

This is a hugely controversial claim, I know, but I would consider this
behaviour to be a serious deficiency in the IEEE standard.

Consider an integer N consisting of a finitely-long string of digits in
base 10.

Consider the infinitely-precise cube root of N (yes I know that it could
never be computed unless N is the cube of an integer, but this is a
mathematical argument, not a computational one), also in base 10. Let's
call it RootN.

Now consider appending three zeroes to the right-hand end of N (let's call
it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).


The key factor here is IEEE floating point is storing numbers in BINARY, 
not DECIMAL, so a multiply by 1000 will change the representation of the 
number, and thus the possible resolution errors.


Store you numbers in IEEE DECIMAL floating point, and the variations by 
multiplying by powers of 10 go away.




The *only *difference between RootN and RootNZZZ is that the decimal point
in RootNZZZ is one place further to the right than the decimal point in
RootN.


No, since the floating point number is stored as a fraction times a 
power of 2, the fraction has changed as well as the power of 2.




None of the digits in RootNZZZ's string should be different from the
corresponding digits in RootN.


Only if the storage format was DECIMAL.



I rest my case.

Perhaps this observation should be brought to the attention of the IEEE. I
would like to know their response to it.


That is why they have developed the Decimal Floating point format, to 
handle people with those sorts of problems.


They just aren't common enough for many things to have adopted the use 
of it.




Stephen Tucker.


--
Richard Damon

--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing caching strategies

2023-02-17 Thread Peter J. Holzer
On 2023-02-17 00:07:12 -0500, avi.e.gr...@gmail.com wrote:
> Roaring bitmaps claim to be an improvement not only over uncompressed
> structures but some other compressed versions but my reading shows it
> may be limited to some uses. Bitsets in general seem to be useful only
> for a largely contiguous set of integers where each sequential bit
> represents whether the nth integer above the lowest is in the set or
> not.

They don't really have to be that contiguous. As long as your integers
fit into 32 bits you're fine.

> Of course, properly set up, this makes Unions and Intersections
> and some other operations fairly efficient. But sets are not the same
> as dictionaries and often you are storing other data types than
> smaller integers.

Of course. Different data types are useful for different applications.

> Many normal compression techniques can require lots of time to
> uncompress to find anything. My impression is that Roaring Bitmaps is
> a tad like the pkzip software that tries various compression
> techniques on each file and chooses whatever seems to work better on
> each one. That takes extra time when zipping, but when unzipping a
> file, it goes directly to the method used to compress it as the type
> is in a header and just expands it one way.

While not completely wrong, I don't think this comparison is very
helpful.


> My impression is that Roaring bitmaps breaks up the range of integers
> into smaller chunks and depending on what is being stored in that
> chunk, may leave it as an uncompressed bitmap, or a list of the sparse
> contents, or other storage methods and can search each version fairly
> quickly. 

It's been a few years since I looked at the implementation, but that's
the gist of it.


> So, I have no doubt it works great for some applications such as
> treating social security numbers as integers.

Not sure what you mean here. You mean storing sets of social security
numbers as roaring bitmaps? You might be able to do that (Last time I
looked RB's were limited to 32 bits, which may not be enough to
represent SSNs unmodified), but are set operations on SSNs something you
routinely do?

> It likely would be overkill to store something like the components of
> an IP address between 0 and 255 inclusive.
> 
> But having said that, there may well be non-integer data that can be
> mapped into and out of integers. As an example, airports or radio
> stations have names like LAX or WPIX. If you limit yourself to ASCII
> letters then every one of them can be stored as a 32-bit integer,
> perhaps with some padding.

Again: What would be the application?

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Oscar Benjamin
On Fri, 17 Feb 2023 at 10:29, Stephen Tucker  wrote:
>
> Thanks, one and all, for your reponses.
>
> This is a hugely controversial claim, I know, but I would consider this
> behaviour to be a serious deficiency in the IEEE standard.
[snip]
>
> Perhaps this observation should be brought to the attention of the IEEE. I
> would like to know their response to it.

Their response would be that they are well aware of what you are
saying and knew about this already since before writing any standards.
The basic limitation of the IEEE standard in this respect is that it
describes individual operations rather than composite operations. Your
calculation involves composing operations, specifically:

result = x ** (n / d)

The problem is that there is more than one operation so we have to
evaluate this in two steps:

e = n / d
result = x ** e

Now the problem is that although n / d is correctly rounded e has a
small error because the exact value of n / d cannot be represented. In
the second operation taking this slightly off value of e as the
intended input means that the correctly rounded result for x ** e is
not the closest float to the true value of the *compound* operation.
The exponentiation operator in particular is very sensitive to changes
in the exponent when the base is large so the tiny error in e leads to
a more noticeable relative error in x ** e.

The only way to prevent this in full generality is to to have a system
in which no intermediate inexact operations are computed eagerly which
means representing expressions symbolically in some way. That is what
the SymPy code I showed does:

In [6]: from sympy import cbrt

In [7]: e = cbrt(1234567890)

In [8]: print(e)
1000*123456789**(1/3)

In [9]: e.evalf(50)
Out[9]: 49793385921817.447440261250171604380899353243631762

Because the *entire* expression is represented here *exactly* as e it
is then possible to evaluate different parts of the expression
repeatedly with different levels of precision and it is necessary to
do that for full accuracy in this case. Here evalf will use more than
50 digits of precision internally so that at the end you have a result
specified to 50 digits but where the error for the entire expression
is smaller than the final digit. If you give it a more complicated
expression then it will use even more digits internally for deeper
parts of the expression tree because that is what is needed to get a
correctly rounded result for the expression as a whole.

This kind of symbolic evaluation is completely outside the scope of
what the IEEE floating point standards are for. Any system based on
fixed precision and eager evaluation will show the same problem that
you have identified. It is very useful though to have a system with
fixed precision and eager evaluation despite these limitations. The
context for which the IEEE standards are mainly intended (e.g. FPU
instructions) is one in which fixed precision and eager evaluation are
the only option.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: LRU cache

2023-02-17 Thread Dino



Thank you, Gerard. I really appreciate your help

Dino

On 2/16/2023 9:40 PM, Weatherby,Gerard wrote:

I think this does the trick:

https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9


#!/usr/bin/env python3
import collections
import random
from typing import Hashable, Any, Optional, Dict, Tuple


class LruCache:
 """Dictionary like storage of most recently inserted values"""

 def __init__(self, size: int = 1000):
 """:param size number of cached entries"""
 assert isinstance(size, int)
 self.size = size
 self.insert_counter = 0
self.oldest = 0
 self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age 
index
 self._lru: Dict[int, Hashable] = {} # age counter dictionary

 def insert(self, key: Hashable, value: Any) -> None:
 """Insert into dictionary"""
 existing = self._data.get(key, None)
 self._data[key] = (value, self.insert_counter)
 self._lru[self.insert_counter] = key
 if existing is not None:
 self._lru.pop(existing[1], None)  # remove old counter value, if 
it exists
 self.insert_counter += 1
 if (sz := len(self._data)) > self.size:  # is cache full?
 assert sz == self.size + 1
 while (
 key := self._lru.get(self.oldest, None)) is None:  # index may not 
be present, if value was reinserted
 self.oldest += 1
 del self._data[key]  # remove oldest key / value from dictionary
 del self._lru[self.oldest]
 self.oldest += 1  # next oldest index
 assert len(self._lru) == len(self._data)

 def get(self, key: Hashable) -> Optional[Any]:
 """Get value or return None if not in cache"""
 if (tpl := self._data.get(key, None)) is not None:
 return tpl[0]
 return None


if __name__ == "__main__":
 CACHE_SIZE = 1000
 TEST_SIZE = 1_000_000
 cache = LruCache(size=CACHE_SIZE)

 all = []
 for i in range(TEST_SIZE):
 all.append(random.randint(-5000, 5000))

 summary = collections.defaultdict(int)
 for value in all:
 cache.insert(value, value * value)
 summary[value] += 1
 smallest = TEST_SIZE
 largest = -TEST_SIZE
 total = 0
 for value, count in summary.items():
 smallest = min(smallest, count)
 largest = max(largest, count)
 total += count
 avg = total / len(summary)
 print(f"{len(summary)} values occurrences range from {smallest} to {largest}, 
average {avg:.1f}")

 recent = set()  # recent most recent entries
 for i in range(len(all) - 1, -1, -1):  # loop backwards to get the most 
recent entries
 value = all[i]
 if len(recent) < CACHE_SIZE:
 recent.add(value)
 if value in recent:
 if (r := cache.get(value)) != value * value:
 raise ValueError(f"Cache missing recent {value} {r}")
 else:
 if cache.get(value) != None:
 raise ValueError(f"Cache includes old {value}")

From: Python-list  on behalf of 
Dino 
Date: Wednesday, February 15, 2023 at 3:07 PM
To: python-list@python.org 
Subject: Re: LRU cache
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:




--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$


--
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Peter J. Holzer
On 2023-02-17 08:38:58 -0700, Michael Torrie wrote:
> On 2/17/23 03:27, Stephen Tucker wrote:
> > Thanks, one and all, for your reponses.
> > 
> > This is a hugely controversial claim, I know, but I would consider this
> > behaviour to be a serious deficiency in the IEEE standard.
> 
> No matter how you do it, there are always tradeoffs and inaccuracies
> moving from real numbers in base 10 to base 2.

This is phrased ambiguosly. So just to clarify:

Real numbers are not in base 10. Or base 2 or base 37 or base e. A
positional system (which uses a base) is just a convenient way to write
a small subset of real numbers. By using any base you limit yourself to
rational numbers (no e or π or √2) and in fact only those rational
numbers where the denominator is a power of the base.

Converting numbers from one base to another with any finite precision
will generally involve rounding - so do that as little as possible.


> That's just the nature of the math.  Any binary floating point
> representation is going to have problems.

Any decimal floating point representation is also going to have
problems.

There is nothing magical about base 10. It's just what we are used to
(which also means that we are used to the rounding errors and aren't
surprised by them as much).

> Also we weren't clear on this, but the IEEE standard is not just
> implemented in software. It's the way your CPU represents floating point
> numbers in silicon.  And in your GPUs (where speed is preferred to
> precision).  So it's not like Python could just arbitrarily do something
> different unless you were willing to pay a huge penalty for speed.

I'm pretty sure that compared to the interpreter overhead of CPython the
overhead of a software FP implementation (whether binary or decimal)
would be rather small, maybe negligible.


> > Perhaps this observation should be brought to the attention of the IEEE. I
> > would like to know their response to it.
> Rest assured the IEEE committee that formalized the format decades ago
> knew all about the limitations and trade-offs.  Over the years CPUs have
> increased in capacity and now we can use 128-bit floating point numbers

The very first IEEE compliant processor (the Intel 8087) had an 80 bit
extended type (in fact it did all computations in 80 bit and only
rounded down to 64 or 32 bits when storing the result). By the 1990s, 96
and 128 bit was quite common.

> which mitigate some of the accuracy problems by simply having more
> binary digits. But the fact remains that some rational numbers in
> decimal are irrational in binary,

Be careful: "Rational" and "irrational" have a standard meaning in
mathematics and it's independent of base.

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Peter J. Holzer
On 2023-02-17 10:27:08 +, Stephen Tucker wrote:
> This is a hugely controversial claim, I know, but I would consider this
> behaviour to be a serious deficiency in the IEEE standard.
> 
> Consider an integer N consisting of a finitely-long string of digits in
> base 10.
> 
> Consider the infinitely-precise cube root of N (yes I know that it could
> never be computed

However, computers exist to compute. Something which can never be
computed is outside of the realm of computing.

> unless N is the cube of an integer, but this is a mathematical
> argument, not a computational one), also in base 10. Let's call it
> RootN.
> 
> Now consider appending three zeroes to the right-hand end of N (let's call
> it NZZZ) and NZZZ's infinitely-precise cube root (RootNZZZ).
> 
> The *only *difference between RootN and RootNZZZ is that the decimal point
> in RootNZZZ is one place further to the right than the decimal point in
> RootN.

No. In mathematics there is no such thing as a decimal point. The only
difference is that RootNZZZ is RootN*10. But there is nothing special
about 10. You could multiply your original number by 512 and then the
new cube root would differ by a factor of 8 (which would show up as
shifted "binary point"[1] in binary but completely different digits in
decimal) or you could multiply by 1728 and then you would need base 12
to get the same digits with a shifted "duodecimal point".

hp

[1] It's really unfortunate that the point which separates the integer
and the fractional part of a number is called a "decimal point" in
English. Makes it hard to talk about non-integer numbers in other
bases.

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Peter J. Holzer
On 2023-02-17 14:39:42 +, Weatherby,Gerard wrote:
> IEEE did not define a standard for floating point arithmetics. They
> designed multiple standards, including a decimal float point one.
> Although decimal floating point (DFP) hardware used to be
> manufactured, I couldn’t find any current manufacturers.

Doesn't IBM any more? Their POWER processors used to implement decimal
FP (starting with POWER8, if I remember correctly).

hp

-- 
   _  | Peter J. Holzer| Story must make more sense than reality.
|_|_) ||
| |   | h...@hjp.at |-- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |   challenge!"


signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Grant Edwards
On 2023-02-17, Richard Damon  wrote:
> [...]
>
>> Perhaps this observation should be brought to the attention of the IEEE. I
>> would like to know their response to it.
>
> That is why they have developed the Decimal Floating point format, to 
> handle people with those sorts of problems.
>
> They just aren't common enough for many things to have adopted the
> use of it.

Back before hardware floating point was common, support for deciaml
floating point was very common.  All of the popular C, Pascal, and
BASIC compilers (for microcomputers) I remember let you choose (at
compile time) whether you wanted to use binary floating point or
decimal (BCD) floating point. People doing scientific stuff usually
chose binary because it was a little faster and you got more
resolution for the same amount of stoage. If you were doing
accounting, you chose BCD (or used fixed-point).

Once hardware (binary) floating point became common, support for
software BCD floating point just sort of went away...

--
Grant




-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Mats Wichmann

On 2/17/23 11:42, Richard Damon wrote:

On 2/17/23 5:27 AM, Stephen Tucker wrote:


The key factor here is IEEE floating point is storing numbers in BINARY, 
not DECIMAL, so a multiply by 1000 will change the representation of the 
number, and thus the possible resolution errors.


Store you numbers in IEEE DECIMAL floating point, and the variations by 
multiplying by powers of 10 go away.


The development of the original IEEE standard led eventually to 
consistent implementation in hardware (when they implement floating 
point at all, which embedded/IoT class chips in particular often don't) 
that aligned with how languages/compilers treated floating point, so 
that's been a really successful standard, whatever one might feel about 
the tradeoffs. Standards are all about finding a mutually acceptable way 
forward, once people admit there is no One Perfect Answer.


Newer editions of 754 (since 2008) have added this decimal floating 
point representation, which is supported by some software such as IBM 
and Intel floating-point libraries.  Hardware support has been slower to 
arrive.  The only ones I've heard of have been the IBM z series 
(mainframes) and somebody else mentioned Power though I'd never seen 
that. It's possible some of the GPU lines may be going this direction.


As far as Python goes... the decimal module has this comment:

> It is a complete implementation of Mike Cowlishaw/IBM's General 
Decimal Arithmetic Specification.


Cowlishaw was the editor of the 2008 and 2019 editions of IEEE 754, fwiw.

And... this topic as a whole comes up over and over again, like 
everywhere.  See Stack Overflow for some amusement.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Grant Edwards
On 2023-02-17, Mats Wichmann  wrote:

> And... this topic as a whole comes up over and over again, like
> everywhere.

That's an understatement.

I remember it getting rehashed over and over again in various USENET
groups 35 years ago when when the VAX 11/780 BSD machine on which I
read news exchanged postings with peers using a half-dozen dial-up
modems and UUCP.

One would have thought it would be a time-saver when David Goldberg
wrote "the paper" in 1991, and you could tell people to go away and
read this:

  https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
  https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf

It didn't help.

Every fall, the groups were again full of a new crop of people who had
just discovered all sorts of bugs in the way 
implemented floating point, and pointing them to a nicely written
document that explained it never did any good.

--
Grant


-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Comparing caching strategies

2023-02-17 Thread avi.e.gross
Peter,

Analogies I am sharing are mainly for me to wrap my head around an idea by
seeing if it matches any existing ideas or templates and is not meant to be
exact. Fair enough?

But in this case, from my reading, the analogy is rather reasonable. The
implementation of Roaring Bitmaps seems to logically break up the space of
all possible values it looks up into multiple "zones" that are somewhat
analogous to individual files, albeit probably in memory as the program
runs. In the pkzip analogy, each file is processed and stored independently
alongside a header that provides enough detail to uniquely open up the
contents when needed. The collection of such compressed files is then one
bigger file that is saved that uses up less space. Roaring bitmaps seems to
determine how best to store each zone and only processes that zone when
requested, hence some of the speedup as each zone is stored in a manner that
generally allows fast access.

I did not raise the issue and thus have no interest in promoting this
technology nor knocking it down. Just wondering what it was under the hood
and whether I might even have a need for it. I am not saying Social Security
numbers are a fit, simply that some form of ID number might fit. If a
company has a unique ID number for each client and uses it consistently,
then an implementation that holds a set stored this way of people using
product A, such as house insurance, and those using product B, such as car
insurance, and perhaps product C is an umbrella policy, might easily handle
some queries such as who uses two or all three (intersections of sets) or
who might merit a letter telling them how much they can save if they
subscribed to two or all three as a way to get more business. Again, just  a
made-up example I can think about. A company which has a million customers
over the years will have fairly large sets as described. 

What is helpful to me in thinking about something will naturally often not
be helpful to you or others but nothing you wrote makes me feel my first
take was in any serious way wrong. It still makes sense to me.

And FYI, the largest integer in signed 32 bits is 2_147_483_647 which is 10
digits. A Social Security number look like xxx-xx- at this time which is
only 9 digits.  Not that it matters, but it seems it still qualifies to be
used as I describe, as long as Roaring bitmaps allows it, minus any spaces
or hyphens and converted to an integer.

As for my other EXAMPLE, I fail to see why I need to provide a specific need
for an application. I don't care what they need it for. The thought was
about whether something that does not start as an integer can be uniquely
mapped into and out of integers of size 32 bits. So I considered a few
examples of short textual items such as three letter airport abbreviations.
But if you cannot imagine an application, consider one similar enough to the
above.  I think there are currently over 47,000 such airports  in the world
and apparently about 20,000 in the US. That seems to exceed the possible
combinations of 26 letters (cubed) so it seems there are 4-letter codes too
such as ZSPD. It should still fit into 4 bytes, for now.

So assume you have a variety of facts such as which airports have
handicapped accessible bathrooms, or have an extra long/strong runway that
can handle huge planes or anything else that is worth knowing. You might
have bitmaps (as is being discussed) that may be sparse for some such info
and fairly dense for other info like having jet fuel available. As above,
finding an airport that has various mixtures may be doable with these sets
and perhaps faster than say queries on a relational database storing the
same info.

I will end by saying this is a hypothetical discussion for me. I am not
doing any projects now where I expect to use Roaring bitmaps but am now
aware of them should any need or opportunity arise. My mind is very full
with such trivia and very little is needed albeit I never know what may come
in as useful. 

Respectfully,

Avi

-Original Message-
From: Python-list  On
Behalf Of Peter J. Holzer
Sent: Friday, February 17, 2023 1:47 PM
To: python-list@python.org
Subject: Re: Comparing caching strategies

On 2023-02-17 00:07:12 -0500, avi.e.gr...@gmail.com wrote:
> Roaring bitmaps claim to be an improvement not only over uncompressed 
> structures but some other compressed versions but my reading shows it 
> may be limited to some uses. Bitsets in general seem to be useful only 
> for a largely contiguous set of integers where each sequential bit 
> represents whether the nth integer above the lowest is in the set or 
> not.

They don't really have to be that contiguous. As long as your integers fit
into 32 bits you're fine.

> Of course, properly set up, this makes Unions and Intersections and 
> some other operations fairly efficient. But sets are not the same as 
> dictionaries and often you are storing other data types than smaller 
> integers.

Of course. Different data

Re: Precision Tail-off?

2023-02-17 Thread Greg Ewing via Python-list

On 18/02/23 7:42 am, Richard Damon wrote:

On 2/17/23 5:27 AM, Stephen Tucker wrote:

None of the digits in RootNZZZ's string should be different from the
corresponding digits in RootN.


Only if the storage format was DECIMAL.


Note that using decimal wouldn't eliminate this particular problem,
since 1/3 isn't exactly representable in decimal either.

To avoid it you would need to use an algorithm that computes nth
roots directly rather than raising to the power 1/n.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Chris Angelico
On Sat, 18 Feb 2023 at 12:41, Greg Ewing via Python-list
 wrote:
>
> On 18/02/23 7:42 am, Richard Damon wrote:
> > On 2/17/23 5:27 AM, Stephen Tucker wrote:
> >> None of the digits in RootNZZZ's string should be different from the
> >> corresponding digits in RootN.
> >
> > Only if the storage format was DECIMAL.
>
> Note that using decimal wouldn't eliminate this particular problem,
> since 1/3 isn't exactly representable in decimal either.
>
> To avoid it you would need to use an algorithm that computes nth
> roots directly rather than raising to the power 1/n.
>

It's somewhat curious that we don't really have that. We have many
other inverse operations - addition and subtraction (not just "negate
and add"), multiplication and division, log and exp - but we have
exponentiation without an arbitrary-root operation. For square roots,
that's not a problem, since we can precisely express the concept
"raise to the 0.5th power", but for anything else, we have to raise to
a fractional power that might be imprecise.

But maybe, in practice, this isn't even a problem?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Python 3.11.2, 3.10.10

2023-02-17 Thread אורי
Hi,

I was surprised that Python 3.11.2 and 3.10.10 have been released without a
notice to this mailing list. What happened?

Thanks,
Uri.
אורי
u...@speedy.net


On Wed, Dec 7, 2022 at 1:03 AM Łukasz Langa  wrote:

> Greetings! We bring you a slew of releases this fine Saint Nicholas /
> Sinterklaas day. Six simultaneous releases has got to be some record.
> There’s one more record we broke this time, you’ll see below.
>
> In any case, updating is recommended due to security content:
>
> 3.7 - 3.12: gh-98739 :
> Updated bundled libexpat to 2.5.0 to fix CVE-2022-43680 <
> https://nvd.nist.gov/vuln/detail/CVE-2022-43680> (heap use-after-free).
> 3.7 - 3.12: gh-98433 :
> The IDNA codec decoder used on DNS hostnames by socket or asyncio related
> name resolution functions no longer involves a quadratic algorithm to fix
> CVE-2022-45061 . This
> prevents a potential CPU denial of service if an out-of-spec excessive
> length hostname involving bidirectional characters were decoded. Some
> protocols such as urllib http 3xx redirects potentially allow for an
> attacker to supply such a name.
> 3.7 - 3.12: gh-11 :
> python -m http.server no longer allows terminal control characters sent
> within a garbage request to be printed to the stderr server log.
> 3.8 - 3.12: gh-87604 :
> Avoid publishing list of active per-interpreter audit hooks via the gc
> module.
> 3.9 - 3.10 (already released in 3.11+ before): gh-97514 <
> https://github.com/python/cpython/issues/97514>: On Linux the
> multiprocessing module returns to using filesystem backed unix domain
> sockets for communication with the forkserver process instead of the Linux
> abstract socket namespace. Only code that chooses to use the “forkserver”
> start method is affected. This prevents Linux CVE-2022-42919 <
> https://nvd.nist.gov/vuln/detail/CVE-2022-42919> (potential privilege
> escalation) as abstract sockets have no permissions and could allow any
> user on the system in the same network namespace (often the whole system)
> to inject code into the multiprocessing forkserver process. This was a
> potential privilege escalation. Filesystem based socket permissions
> restrict this to the forkserver process user as was the default in Python
> 3.8 and earlier.
> 3.7 - 3.10: gh-98517 :
> Port XKCP’s fix for the buffer overflows in SHA-3 to fix CVE-2022-37454 <
> https://nvd.nist.gov/vuln/detail/CVE-2022-37454>.
> 3.7 - 3.9 (already released in 3.10+ before): gh-68966 <
> https://github.com/python/cpython/issues/68966>: The deprecated mailcap
> module now refuses to inject unsafe text (filenames, MIME types,
> parameters) into shell commands to address CVE-2015-20107 <
> https://nvd.nist.gov/vuln/detail/CVE-2015-20107>. Instead of using such
> text, it will warn and act as if a match was not found (or for test
> commands, as if the test failed).
>  <
> https://discuss.python.org/t/python-3-11-1-3-10-9-3-9-16-3-8-16-3-7-16-and-3-12-0-alpha-3-are-now-available/21724#python-3120-alpha-3-1>Python
> 3.12.0 alpha 3
>
> Get it here, read the change log, sing a GPT-3-generated Sinterklaas song:
>
> https://www.python.org/downloads/release/python-3120a3/ <
> https://www.python.org/downloads/release/python-3120a3/>
>
> 216 new commits since 3.12.0 alpha 2 last month.
>
>  <
> https://discuss.python.org/t/python-3-11-1-3-10-9-3-9-16-3-8-16-3-7-16-and-3-12-0-alpha-3-are-now-available/21724#python-3111-2>Python
> 3.11.1
>
> Get it here, see the change log, read the recipe for quark soup:
>
> https://www.python.org/downloads/release/python-3111/ <
> https://www.python.org/downloads/release/python-3111/>
>
> A whopping 495 new commits since 3.11.0. This is a massive increase of
> changes comparing to 3.10 at the same stage in the release cycle: there
> were “only” 339 commits between 3.10.0 and 3.10.1.
>
>  <
> https://discuss.python.org/t/python-3-11-1-3-10-9-3-9-16-3-8-16-3-7-16-and-3-12-0-alpha-3-are-now-available/21724#python-3109-3>Python
> 3.10.9
>
> Get it here, read the change log, see circular patterns:
>
> https://www.python.org/downloads/release/python-3109/ <
> https://www.python.org/downloads/release/python-3109/>
>
> 165 new commits.
>
>  <
> https://discuss.python.org/t/python-3-11-1-3-10-9-3-9-16-3-8-16-3-7-16-and-3-12-0-alpha-3-are-now-available/21724#python-3916-4>Python
> 3.9.16
>
> Get it here, read the change log, consider upgrading to a newer version:
>
> https://www.python.org/downloads/release/python-3916/ <
> https://www.python.org/downloads/release/python-3916/>
>
> Security-only release with no binaries. 10 commits.
>
>  <
> https://discuss.python.org/t/python-3-11-1-3-10-9-3-9-16-3-8-16-3-7-16-and-3-12-0-alpha-3-are-now-available/21724#python-3816-5>Python
> 3.

Re: Precision Tail-off?

2023-02-17 Thread Michael Torrie
On 2/17/23 15:03, Grant Edwards wrote:
> Every fall, the groups were again full of a new crop of people who had
> just discovered all sorts of bugs in the way 
> implemented floating point, and pointing them to a nicely written
> document that explained it never did any good.

But to be fair, Goldberg's article is pretty obtuse and formal for most
people, even programmers.  I don't need lots of formal proofs as he
shows.  Just a summary is sufficient I'd think.  Although I've been
programming for many years, I have no idea what he means with most of
the notation in that paper.

Although I have a vague notion of what's going on, as my last post
shows, I don't know any of the right terminology.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Precision Tail-off?

2023-02-17 Thread Oscar Benjamin
On Sat, 18 Feb 2023 at 01:47, Chris Angelico  wrote:
>
> On Sat, 18 Feb 2023 at 12:41, Greg Ewing via Python-list
>  wrote:
> >
> > On 18/02/23 7:42 am, Richard Damon wrote:
> > > On 2/17/23 5:27 AM, Stephen Tucker wrote:
> > >> None of the digits in RootNZZZ's string should be different from the
> > >> corresponding digits in RootN.
> > >
> > > Only if the storage format was DECIMAL.
> >
> > Note that using decimal wouldn't eliminate this particular problem,
> > since 1/3 isn't exactly representable in decimal either.
> >
> > To avoid it you would need to use an algorithm that computes nth
> > roots directly rather than raising to the power 1/n.
> >
>
> It's somewhat curious that we don't really have that. We have many
> other inverse operations - addition and subtraction (not just "negate
> and add"), multiplication and division, log and exp - but we have
> exponentiation without an arbitrary-root operation. For square roots,
> that's not a problem, since we can precisely express the concept
> "raise to the 0.5th power", but for anything else, we have to raise to
> a fractional power that might be imprecise.

Various libraries can do this. Both SymPy and NumPy have cbrt for cube roots:

  >>> np.cbrt(12345678900.)
  4.979338592181745e+20

SymPy can also evaluate any rational power either exactly or to any
desired accuracy. Under the hood SymPy uses mpmath for the approximate
numerical evaluation part of this and mpmath can also be used directly
with its cbrt and nthroot functions to do this working with any
desired precision.

> But maybe, in practice, this isn't even a problem?

I'd say it's a small problem. Few people would use such a feature but
it would be have a little usefulness for those people if it existed.
Libraries like mpmath and SymPy provide this and can offer a big step
up for those who are really concerned about exactness or accuracy
though so there are already options for those who care. These are a
lot slower though than working with plain old floats but on the other
hand offer vastly more than a math.cbrt function could offer to
someone who needs something more accurate than x**(1/3).

For those who are working with floats the compromise is clear: errors
can accumulate in calculations. Taking the OPs example to the extreme,
the largest result that does not overflow is:

  >>> (123456789. * 10**300) ** (1.0 / 3.0)
  4.979338592181679e+102

Only the last 3 digits are incorrect so the error is still small. It
is not hard to find other calculations where *all* the digits are
wrong though:

  >>> math.cos(3)**2 + math.sin(3)**2 - 1
  -1.1102230246251565e-16

So if you want to use floats then you need to learn to deal with this
as appropriate for your use case. IEEE standards do their best to make
results reproducible across machines as well as limiting avoidable
local errors so that global errors in larger operations are *less
likely* to dominate the result. Their guarantees are only local though
so as soon as you have more complicated calculations you need your own
error analysis somehow. IEEE guarantees are in that case also useful
for those who actually want to do a formal error analysis.

--
Oscar
-- 
https://mail.python.org/mailman/listinfo/python-list