[Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:

>>> '%f' % 2**166.
'93536104789177786765035829293842113257979682750464.00'
>>> '%f' % 2**167.
'1.87072e+50'

I propose removing this feature for 3.1

More details: The current behaviour is documented (standard
library->builtin types).  (Until very recently, it was actually
misdocumented as changing at 1e25, not 1e50.)

"""For safety reasons, floating point precisions are clipped to 50; %f
conversions for numbers whose absolute value is over 1e50 are
replaced by %g conversions. [5] All other errors raise exceptions."""

There's even a footnote:

"""[5]  These numbers are fairly arbitrary. They are intended to
avoid printing endless strings of meaningless digits without
hampering correct use and without having to know the exact
precision of floating point values on a particular machine."""

I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.  I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.

Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.
 - now that we're using David Gay's 'perfect rounding'
   code, we can be sure that the digits aren't entirely
   meaningless, or at least that they're the 'right' meaningless
   digits.  This wasn't true before.
 - C doesn't do this, and the %f, %g, %e formats really
   owe their heritage to C.
 - float formatting is already quite complicated enough; no
   need to add to the mental complexity
 - removal simplifies the implementation :-)


On to the second proposed change:

(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:

>>> 4., 10.
(4.0, 10.0)
>>> 4. + 10.j
(4+10j)

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".

Mostly this is just about consistency, ease of implementation,
and aesthetics.  As far as I can tell, the extra '.0' in the float
repr serves two closely-related purposes:  it makes it clear to
the human reader that the number is a float rather than an
integer, and it makes sure that e.g., eval(repr(x)) recovers a
float rather than an int.  The latter point isn't a concern for
the current complex repr, but the former is:  4+10j looks to
me more like a Gaussian integer than a complex number.

Any comments?

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Steven D'Aprano
On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote:
> I'd like to propose two minor changes to float and complex
> formatting, for 3.1.  I don't think either change should prove
> particularly disruptive.
>
> (1) Currently, '%f' formatting automatically changes to '%g'
> formatting for numbers larger than 1e50.
...
> I propose removing this feature for 3.1

No objections from me. +1

> I propose changing the complex str and repr to behave like the
> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
> rather than "(4+10j)".

No objections here either. +0



-- 
Steven D'Aprano
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Michael Foord

Steven D'Aprano wrote:

On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote:
  

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g'
formatting for numbers larger than 1e50.


...
  

I propose removing this feature for 3.1



No objections from me. +1

  

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".



No objections here either. +0



  
Doing it sooner rather than later means that it is less likely to 
disrupt anyone relying on the representation (i.e. doctests).


Michael

--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Stephen J. Turnbull
Paul Moore writes:
 > 2009/4/24 Stephen J. Turnbull :
 > > Paul Moore writes:
 > >
 > >  > The pros for Martin's proposal are a uniform cross-platform interface,
 > >  > and a user-friendly API for the common case.
 > >
 > > A more accurate phrasing would be "... a user-friendly API for those
 > > who feel very lucky today."  Which is the common case, of course, but
 > > spins a little differently.
 > 
 > Sorry, but I think you're misrepresenting things. I'd have probably
 > let you off if you'd missed out the "very" - but I do think that it's
 > the common case. Consider:

If you need reliability, then you can't get it this way.  The reason
"very" is (somewhat) justified is that this kind of issue is a little
like unemployment.  You hardly ever meet someone who's 7.2%
unemployed, but you probably know several who are 100% unemployed.  If
you see a broken encoding once, you're likely to see it a million times
(spammers have the most broken software) or maybe have it raise an
unhandled Exception a dozen times (in rate of using busted software,
the spammers are closely followed by bosses---which would be very bad,
eh, if you 2/3 of the mail from your boss ends up in an undeliverables
queue due to encoding errors that are unhandled by your some filter in
your mail pipeline).

 > - Windows systems where broken Unicode (lone surrogates or whatever)
 > isn't involved
 > - Unix systems where the user's stated filesystem encoding is correct

 > Can you honestly say that this isn't the vast majority of real-world
 > environments?

Again, that's not the point.  The point is that six-sigma reliability
world-wide is not going to be very comforting to the poor souls who
happen to have broken software in their environment sending broken
encodings regularly, because they're going to be dealing with one or
two sigmas, and that's just not good enough in a production
environment.

 > > If you didn't start with a valid string in a known encoding, you
 > > shouldn't treat it as characters because it's not.
 > 
 > Again, that's the purist argument. If you have a string (of bytes, I
 > guess) and a 99% certain guess as to the correct encoding, then I'd
 > argue that, as long as (a) it's not mission-critical (lives or backups
 > depend on it)

Assurance that you can even determine (a) is not provided by the PEP.
There is no way to contain a problem if it should occur, because it's
"just a string" and could go anywhere, and get converted back or
otherwise manipulated in a context that doesn't know how to handle it
(which might not even be Python if a C-level extension is involved).
Given that Python has no internal mechanism for saying "in this area
only valid Unicode will be accepted", it seems likely that mission
critical software *will* interact with this feature, if only
indirectly (or perhaps only in software originally intended for use in
the U.S. only, but then it gets exported, etc).

 > and (b) you have a means of failing relatively
 > gracefully, you have every reason to make the assumption about
 > encoding.

(b) is not provided in the PEP, either.  We have no idea what the
failure mode will be.

 > After all, what's the alternative?

The alternative is to refuse to provide a simple standard way to
decode unreliably, and in that way make the user reponsible for an
explicit choice about what level and kinds of unreliability they will
accept.

I realize that's unpalatable to most people who use Python to develop
software, and so I'm unwilling to go even -0 on the PEP.  However, to
give one example, I've been following Mailman development for about 10
years, and it is a dismal story despite a group of developers very
sympathetic to encoding and multicultural issues.  As recently as
Mailman 2.10 (IIRC) there were *still* bugs in encoding handling that
could stop the show (ie, not only did the buggy post not get
processed, but the exception propagated high enough to cause
everything behind it in the queue to fail, too).  I think it would be
sad if ten years from now there was software using this technique and
failing occasionally.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Bug tracker down?

2009-04-26 Thread Mark Dickinson
The bugs.python.org site seems to be down.  ping gives me
the following (from Ireland):

Macintosh-4:py3k dickinsm$ ping bugs.python.org
PING bugs.python.org (88.198.142.26): 56 data bytes
36 bytes from et.2.16.rs3k6.rz5.hetzner.de (213.239.244.101):
Destination Host Unreachable
Vr HL TOS  Len   ID Flg  off TTL Pro  cks  Src  Dst
 4  5  00 5400 77e1   0   3a  01 603d 192.168.1.2  88.198.142.26

Various others on #python-dev have confirmed that it's not working for them.
Does anyone know what the problem is?

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug tracker down?

2009-04-26 Thread Aahz
On Sun, Apr 26, 2009, Mark Dickinson wrote:
>
> The bugs.python.org site seems to be down.  

Dunno -- forwarded to the people who can do something about it.  (There's
a migration to a new mailserver going on, but I don't think this is
related.)
-- 
Aahz ([email protected])   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug tracker down?

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 4:19 PM, Aahz  wrote:
> On Sun, Apr 26, 2009, Mark Dickinson wrote:
>>
>> The bugs.python.org site seems to be down.
>
> Dunno -- forwarded to the people who can do something about it.  (There's
> a migration to a new mailserver going on, but I don't think this is
> related.)

Thanks.  Who should I contact next time, to avoid spamming python-dev?

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug tracker down?

2009-04-26 Thread Aahz
On Sun, Apr 26, 2009, Mark Dickinson wrote:
> On Sun, Apr 26, 2009 at 4:19 PM, Aahz  wrote:
>> On Sun, Apr 26, 2009, Mark Dickinson wrote:
>>>
>>> The bugs.python.org site seems to be down.
>>
>> Dunno -- forwarded to the people who can do something about it. ?(There's
>> a migration to a new mailserver going on, but I don't think this is
>> related.)
> 
> Thanks.  Who should I contact next time, to avoid spamming python-dev?

python-dev isn't a bad place (because it alerts the core developers), but
you can also send a message to [email protected]
-- 
Aahz ([email protected])   <*> http://www.pythoncraft.com/

"If you think it's expensive to hire a professional to do the job, wait
until you hire an amateur."  --Red Adair
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Eric Smith

Mark Dickinson wrote:

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:

...

I propose removing this feature for 3.1


I'm +1 on this.


I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.


I agree that this is a big part of the reason it was done. There's still 
some work to be done in the fallback code which we use if we can't use 
Gay's implementation of _Py_dg_dtoa. But it's reasonably easy to 
calculate the maximum buffer size needed given the precision, for 
passing on to PyOS_snprintf. (At least I think that sentence is true, 
I'll very with Mark offline).



Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.


This is the big reason for me.


 - float formatting is already quite complicated enough; no
   need to add to the mental complexity


And this, too.


(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:

...

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".


I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, 
I'm not sure about the spaces around the sign. If we do want the spaces 
there, we can get rid of Py_DTSF_SIGN, since that's the only place it's 
used and we won't be able to use it for complex going forward.


Eric.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith  wrote:
> Mark Dickinson wrote:
>> I propose changing the complex str and repr to behave like the
>> float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
>> rather than "(4+10j)".
>
> I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm
> not sure about the spaces around the sign. If we do want the spaces there,

Whoops.  The spaces were a mistake:  I'm not proposing to add those.
I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)".

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Terry Reedy wrote:

> Is NUL \0 allowed in POSIX file names?  If not, could that be used as an 
> escape char.  If it is not legal, then custom translated strings that 
> escape in the wild would raise a red flag as soon as something else 
> tried to use them.

Per David Wheeler's excellent "Fixing Linux/Unix/POSIX Filenames"[1]:

 Traditionally, Unix/Linux/POSIX filenames can be almost any sequence
 of bytes, and their meaning is unassigned. The only real rules are that
 “/” is always the directory separator, and that filenames can’t contain
 byte 0 (because this is the terminator).


[1] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html


Tres.
- --
===
Tres Seaver  +1 540-429-0999  [email protected]
Palladion Software   "Excellence by Design"http://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJ9KHg+gerLs4ltQ4RAs0HAKCiAOxmB8oBJRIoOIK+OK2LryUN6ACgp64k
fzGUNScJwcdzzod3N+5JhOE=
=Cw4m
-END PGP SIGNATURE-

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug tracker down?

2009-04-26 Thread Martin v. Löwis
> Does anyone know what the problem is?

The hardware running it apparently has serious problems.
Upfronthosting, the company providing the hardware, is
working on a solution. Unfortunately, it is difficult to
get support from the datacenter on weekends.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Scott David Daniels

Mark Dickinson wrote:

... """[5]   These numbers are fairly arbitrary. They are intended to
   avoid printing endless strings of meaningless digits without
   hampering correct use and without having to know the exact
   precision of floating point values on a particular machine."""
I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.

As a user of Idle, I would not like to see the change you seek of
having %f stay full-precision.  When a number gets too long to print
on a single line, the wrap depends on the current window width, and
is calculated dynamically.  One section of the display with a 8000
-digit (100-line) text makes Idle slow to scroll around in.  It is
too easy for numbers to go massively positive in a bug.


 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6;  it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.

>  - now that we're using David Gay's 'perfect rounding'
>code, we can be sure that the digits aren't entirely
>meaningless, or at least that they're the 'right' meaningless
>digits.  This wasn't true before.

However, this is, I agree, a problem.  Since all of these numbers
should end in a massive number of zeroes, how about we replace
only the trailing zeroes with the e, so we wind up with:
 1157920892373161954235709850086879078532699846656405640e+23
  or 115792089237316195423570985008687907853269984665640564.0e+24
or some such, rather than
 1.157920892373162e+77
  or 1.15792089237316195423570985008687907853269984665640564e+77

--Scott David Daniels
[email protected]

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
 wrote:
> As a user of Idle, I would not like to see the change you seek of
> having %f stay full-precision.  When a number gets too long to print
> on a single line, the wrap depends on the current window width, and
> is calculated dynamically.  One section of the display with a 8000
> -digit (100-line) text makes Idle slow to scroll around in.  It is
> too easy for numbers to go massively positive in a bug.

I see your point.  Since we're talking about floats, thought, there
should never be more than 316 characters in a '%f' % x: the
largest float is around 1.8e308, giving 308 digits before the
point, 6 after, a decimal point, and possibly a minus sign.
(Assuming that your platform uses IEEE 754 doubles.)

> However, this is, I agree, a problem.  Since all of these numbers
> should end in a massive number of zeroes

But they typically don't end in zeros (except the six zeros following
the point),
because they're stored in binary rather than decimal.  For example:

>>> int(1e308)
11097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] float formatting

2009-04-26 Thread Dennis Allison
Floating point printing is tricky, as I am sure you know.  You might
want to refrefresh your understanding by consulting the literture--I
know I would.  For example, you might want to look at 

http://portal.acm.org/citation.cfm?id=93559

Guy Steele's paper:

Guy L. Steele , Jon L. White, How to print floating-point numbers accurately, 
ACM SIGPLAN Notices, v.39 n.4, April 2004 

is a classic and worthy of a read.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Terry Reedy

Mark Dickinson wrote:

I'd like to propose two minor changes to float and complex
formatting, for 3.1.  I don't think either change should prove
particularly disruptive.

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:


'%f' % 2**166.

'93536104789177786765035829293842113257979682750464.00'

'%f' % 2**167.

'1.87072e+50'

I propose removing this feature for 3.1

More details: The current behaviour is documented (standard
library->builtin types).  (Until very recently, it was actually
misdocumented as changing at 1e25, not 1e50.)

"""For safety reasons, floating point precisions are clipped to 50; %f
conversions for numbers whose absolute value is over 1e50 are
replaced by %g conversions. [5] All other errors raise exceptions."""

There's even a footnote:

"""[5]   These numbers are fairly arbitrary. They are intended to
avoid printing endless strings of meaningless digits without
hampering correct use and without having to know the exact
precision of floating point values on a particular machine."""

I don't find this particularly convincing, though---I just don't see
a really good reason not to give the user exactly what she/he
asks for here.  I have a suspicion that at least part of the
motivation for the '%f' -> '%g' switch is that it means the
implementation can use a fixed-size buffer.  But Eric has
fixed this (in 3.1, at least) and the buffer is now dynamically
allocated, so this isn't a concern any more.

Other reasons not to switch from '%f' to '%g' in this way:

 - the change isn't gentle:  as you go over the 1e50 boundary,
   the number of significant digits produced suddenly changes
   from 56 to 6; 


Looking at your example, that jumped out at me as somewhat startling...


it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.


So I agree with this, even if the default # of sig digits were less.
+1


 - now that we're using David Gay's 'perfect rounding'
   code, we can be sure that the digits aren't entirely
   meaningless, or at least that they're the 'right' meaningless
   digits.  This wasn't true before.
 - C doesn't do this, and the %f, %g, %e formats really
   owe their heritage to C.
 - float formatting is already quite complicated enough; no
   need to add to the mental complexity
 - removal simplifies the implementation :-)


On to the second proposed change:

(2) complex str and repr don't behave like float str and repr, in that
the float version always adds a trailing '.0' (unless there's an
exponent), but the complex version doesn't:


4., 10.

(4.0, 10.0)

4. + 10.j

(4+10j)

I propose changing the complex str and repr to behave like the
float version.  That is, repr(4. + 10.j) should be "(4.0 + 10.0j)"
rather than "(4+10j)".

Mostly this is just about consistency, ease of implementation,
and aesthetics.  As far as I can tell, the extra '.0' in the float
repr serves two closely-related purposes:  it makes it clear to
the human reader that the number is a float rather than an
integer, and it makes sure that e.g., eval(repr(x)) recovers a
float rather than an int.  The latter point isn't a concern for
the current complex repr, but the former is:  4+10j looks to
me more like a Gaussian integer than a complex number.


I agree.  A complex is alternately an ordered pair of floats.  A 
different, number-theory oriented implementation of Python might even 
want to read 4+10j as a G. i.


tjr

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Scott David Daniels

Mark Dickinson wrote:
> On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels
>  wrote:
>> As a user of Idle, I would not like to see the change you seek of
>> having %f stay full-precision.  When a number gets too long to print
>> on a single line, the wrap depends on the current window width, and
>> is calculated dynamically.  One section of the display with a 8000
>> -digit (100-line) text makes Idle slow to scroll around in.  It is
>> too easy for numbers to go massively positive in a bug.
>
I had also said (without explaining:
> > only the trailing zeroes with the e, so we wind up with:
> >  1157920892373161954235709850086879078532699846656405640e+23
> >  or 115792089237316195423570985008687907853269984665640564.0e+24
> >  or some such, rather than
> >  1.157920892373162e+77
> >  or 1.15792089237316195423570985008687907853269984665640564e+77
These are all possible representations for 2 ** 256.

> I see your point.  Since we're talking about floats, thought, there
> should never be more than 316 characters in a '%f' % x: the
> largest float is around 1.8e308, giving 308 digits before the
> point, 6 after, a decimal point, and possibly a minus sign.
> (Assuming that your platform uses IEEE 754 doubles.)
You are correct that I had not thought long and hard about that.
308 is livable, if not desireable.  I was remebering accidentally
displaying the result of a factorial call.

>> However, this is, I agree, a problem.  Since all of these numbers

>> should end in a massive number of zeroes
>
> But they typically don't end in zeros (except the six zeros following
> the point),
> because they're stored in binary rather than decimal
_but_ the printed decimal number I am proposing is within one ULP of
the value of the binary numbery.  That is, the majority of the digits
in int(1e308) are a fiction -- they could just as well be the digits of
int(1e308) + int(1e100) because 1e308 + 1e100 == 1e308
That is the sense in which I say those digits in decimal are zeroes.
My proposal was to have the integer part of the expansion be a
representation of the accuracy of the number in a visible form.
I chose the value I chose since a zero lies at the very end, and
tried to indicate I did not really care where trailing actual accuracy
zeros get taken off the representation.  The reason I don't care is
that the code from getting a floating point value is tricky, and I
suspect the printing code might not easily be able to distinguish
between a significant trailing zero and fictitous bits.

--Scott David Daniels
[email protected]

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Mark Dickinson
On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels
 wrote:

> I had also said (without explaining:
>> > only the trailing zeroes with the e, so we wind up with:
>> >      1157920892373161954235709850086879078532699846656405640e+23
>> >  or 115792089237316195423570985008687907853269984665640564.0e+24
>> >  or some such, rather than
>> >      1.157920892373162e+77
>> >  or 1.15792089237316195423570985008687907853269984665640564e+77
> These are all possible representations for 2 ** 256.

Understood.

> _but_ the printed decimal number I am proposing is within one ULP of
> the value of the binary numbery.

But there are plenty of ways to get this if this is what you want: if
you want a displayed result that's within 1 ulp (or 0.5 ulps, which
would be better) of the true value then repr should serve your needs.
If you want more control over the number of significant digits then
'%g' formatting gives that, together with a nice-looking output for
small numbers.

It's only '%f' formatting that I'm proposing changing: I see a
'%.2f' formatting request as a very specific, precise one: give me
exactly 2 digits after the point---no more, no less, and it seems
wrong and arbitrary that this request should be ignored for
numbers larger than 1e50 in absolute value.

That is, for general float formatting needs, use %g, str and repr.
%e and %f are for when you want fine control.

> That is, the majority of the digits
> in int(1e308) are a fiction

Not really: the float that Python stores has a very specific value,
and the '%f' formatting is showing exactly that value.  (Yes, I
know that some people advocate viewing a float as a range
of values rather than a specific value;  but I'm pretty sure that
that's not the way that the creators of IEEE 754 were thinking.)

> zeros get taken off the representation.  The reason I don't care is
> that the code from getting a floating point value is tricky, and I
> suspect the printing code might not easily be able to distinguish
> between a significant trailing zero and fictitous bits.

As of 3.1, the printing code should be fine:  it's using David
Gay's 'perfect rounding' code, so what's displayed should
be correctly rounded to the requested precision.

Mark
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Raymond Hettinger



it would make more sense to me if it
   stayed fixed at 56 sig digits for numbers larger than 1e50.


So I agree with this, even if the default # of sig digits were less.


Several reasons to accept Mark's proposal:

* It matches what C does and many languages tend to copy the
  C standards with respect to format codes.  Matching other
  languages helps in porting code, copying algorithms, and mentally
  switching back and forth when working in multiple languages.

* When a programmer has chosen %f, that means that they have
  consciously rejected choosing %e or %g.  It is generally best to
  have the code do what the programmer asked for ;-)

* Code that tested well with 1e47, 1e48, 1e49, and 1e50
  suddenly shifts behavior with 1e51.  Behavior shifts like that
  are bug bait.

* The 56 significant digits may be rooted in the longest
  decimal expansion of a 53 bit float.  For example,
  len(str(Decimal.from_float(.1))) is 57 including the leading
  zero.   But not all machines (now, in the past, or in the future)
  use 53 bits for the significand.

* Use of exponents is common but not universal.  Some converters
  for SQL specs like Decimal(10,80) may not recognize the
  e-notation.  The xmlrpc spec only accepts decimal expansions
  not %e notation.

* The programmer needs to have some way to spell-out a
  decimal expansion when needed.   Currently, %f is the only way.


Raymond




___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Eric Smith

Mark Dickinson wrote:

(1) Currently, '%f' formatting automatically changes to '%g' formatting for
numbers larger than 1e50.  For example:


'%f' % 2**166.

'93536104789177786765035829293842113257979682750464.00'

'%f' % 2**167.

'1.87072e+50'

I propose removing this feature for 3.1


I don't think we've stated it on this discussion, but I know from 
private email with Mark that his proposal is for both %-formatting and 
for float.__format__ to have this change. I just want to get it on the 
record here.


Eric.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Adrian
How about another str-like type, a sequence of char-or-bytes? Could be
called strbytes or stringwithinvalidcharacters. It would support
whatever subset of str functionality makes sense / is easy to
implement plus a to_escaped_str() method (that does the escaping the
PEP talks about) for people who want to use regexes or other str-only
stuff.

Here is a description by example:
os.listdir('.') -> [strbytes('normal_file'), strbytes('bad', 128, 'file')]
strbytes('a')[0] -> strbytes('a')
strbytes('bad', 128, 'file')[3] -> strbytes(128)
strbytes('bad', 128, 'file').to_escaped_str() -> 'bad?128file'

Having a separate type is cleaner than a "str that isn't exactly what
it represents". And making the escaping an explicit (but
rarely-needed) step would be less surprising for users. Anyway, I
don't know a whole lot about this issue so there may an obvious reason
this is a bad idea.

On Wed, Apr 22, 2009 at 6:50 AM, "Martin v. Löwis"  wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
>
> Regards,
> Martin
>
> PEP: 383
> Title: Non-decodable Bytes in System Character Interfaces
> Version: $Revision: 71793 $
> Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $
> Author: Martin v. Löwis 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 22-Apr-2009
> Python-Version: 3.1
> Post-History:
>
> Abstract
> 
>
> File names, environment variables, and command line arguments are
> defined as being character data in POSIX; the C APIs however allow
> passing arbitrary bytes - whether these conform to a certain encoding
> or not. This PEP proposes a means of dealing with such irregularities
> by embedding the bytes in character strings in such a way that allows
> recreation of the original byte string.
>
> Rationale
> =
>
> The C char type is a data type that is commonly used to represent both
> character data and bytes. Certain POSIX interfaces are specified and
> widely understood as operating on character data, however, the system
> call interfaces make no assumption on the encoding of these data, and
> pass them on as-is. With Python 3, character strings use a
> Unicode-based internal representation, making it difficult to ignore
> the encoding of byte strings in the same way that the C interfaces can
> ignore the encoding.
>
> On the other hand, Microsoft Windows NT has correct the original
> design limitation of Unix, and made it explicit in its system
> interfaces that these data (file names, environment variables, command
> line arguments) are indeed character data, by providing a
> Unicode-based API (keeping a C-char-based one for backwards
> compatibility).
>
> For Python 3, one proposed solution is to provide two sets of APIs: a
> byte-oriented one, and a character-oriented one, where the
> character-oriented one would be limited to not being able to represent
> all data accurately. Unfortunately, for Windows, the situation would
> be exactly the opposite: the byte-oriented interface cannot represent
> all data; only the character-oriented API can. As a consequence,
> libraries and applications that want to support all user data in a
> cross-platform manner have to accept mish-mash of bytes and characters
> exactly in the way that caused endless troubles for Python 2.x.
>
> With this PEP, a uniform treatment of these data as characters becomes
> possible. The uniformity is achieved by using specific encoding
> algorithms, meaning that the data can be converted back to bytes on
> POSIX systems only if the same encoding is used.
>
> Specification
> =
>
> On Windows, Python uses the wide character APIs to access
> character-oriented APIs, allowing direct conversion of the
> environmental data to Python str objects.
>
> On POSIX systems, Python currently applies the locale's encoding to
> convert the byte data to Unicode. If the locale's encoding is UTF-8,
> it can represent the full set of Unicode characters, otherwise, only a
> subset is representable. In the latter case, using private-use
> characters to represent these bytes would be an option. For UTF-8,
> doing so would create an ambiguity, as the private-use characters may
> regularly occur in the input also.
>
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.
>
> The error handler interface is extended to allow the encode error
> handler to return byte strings immediately, in addition to returning
> Unicode strings which then get encoded again.
>
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>
> Discussion
> ==
>
> While providing a uniform API to non-decodable bytes, 

Re: [Python-Dev] Two proposed changes to float formatting

2009-04-26 Thread Scott David Daniels

ark Dickinson wrote:
> On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote:
>...
>> I had also said (without explaining:
 only the trailing zeroes with the e, so we wind up with:
  1157920892373161954235709850086879078532699846656405640e+23
  or 115792089237316195423570985008687907853269984665640564.0e+24
  or some such, rather than
  1.157920892373162e+77
  or 1.15792089237316195423570985008687907853269984665640564e+77
>> These are all possible representations for 2 ** 256.
>
> Understood.
>
>> _but_ the printed decimal number I am proposing is within one ULP of
>> the value of the binary numbery.
>
> But there are plenty of ways to get this if this is what you want: if
> you want a displayed result that's within 1 ulp (or 0.5 ulps, which
> would be better) of the true value then repr should serve your needs.

The representation I am suggesting here is a half-way measure between
your proposal and the existing behvior.  This representation addresses
the abrupt transition that you point out (number of significant digits
drops precipitously) without particularly changing the goal of the
transition (displaying faux accuracy), without, in my (possibly naive)
view, seriously complicating either the print-generating code or the
issues for the reader of the output.

To wit, the proposal is (A) for numbers where the printed digits exceed
the accuracy presented, represent the result as an integer with an e+N,
rather than a number between 1 and 2-epsilon with an exponent that makes
you have to count digits to compare the two values, and (B) that the full
precision available in the the value be shown in the representation.

Given that everyone understands that is what I am proposing, I am OK
with the decision going where it will.  I am comforted that we are only
talking about about four wrapped lines if we go to the full integer,
which I had not realized.  Further, I agree with you that there is an
abrupt transition in represented accuracy as we cross from %f to %g,
that should be somehow addressed.  You want to address it by continuing
to show digits, and I want to limit the digits shown to a value that
reflects the known accuracy.  I also want text that compares "smoothly"
with numbers near the transition (so that greater-than and less-than
relationships are obvious without thinking, hence the representation
that avoids the "normalized" mantissa.
 .
Having said all this, I think my compromise position should be clear.
I did not mean to argue with you, but rather intended to propose a
possible middle way that some might find appealing.

--Scott David Daniels
[email protected]

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Martin v. Löwis
> How about another str-like type, a sequence of char-or-bytes?

That would be a different PEP. I personally like my own proposal
more, but feel free to propose something different.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-26 Thread Glenn Linderman
On approximately 4/25/2009 5:35 AM, came the following characters from 
the keyboard of Martin v. Löwis:

Because the encoding is not reliably reversible.


Why do you say that? The encoding is completely reversible
(unless we disagree on what "reversible" means).


I'm +1 on the concept, -1 on the PEP, due solely to the lack of a
reversible encoding.


Then please provide an example for a setup where it is not reversible.

Regards,
Martin


It is reversible if you know that it is decoded, and apply the encoding. 
 But if you don't know that has been encoded, then applying the reverse 
transform can convert an undecoded str that matches the decoded str to 
the form that it could have, but never did take.


The problem is that there is no guarantee that the str interface 
provides only strictly conforming Unicode, so decoding bytes to 
non-strictly conforming Unicode, can result in a data pun between 
non-strictly conforming Unicode coming from the str interface vs bytes 
being decoded to non-strictly conforming Unicode coming from the bytes 
interface.


Any particular problem that always consistently uses one or the other 
(bytes vs str) APIs under the covers might never be affected by such a 
data pun, but programs that may use both types of interface could 
potentially see a data pun.


If your PEP depends on consistent use of one or the other type of 
interface, you should say so, and if the platform only provides that 
type of interface, maybe all is well.  Both types of interfaces are 
available on Windows, perhaps POSIX only provides native bytes 
interfaces, and if the PEP is the only way to provide str interfaces, 
then perhaps consistency use is required.


There are still issues regarding how Windows and POSIX programs that are 
sharing cross-mounted file systems might communicate file names between 
each other, which is not at all clear from the PEP.  If this is an 
insoluble or un-addressed issue, it should be stated.  (It is probably 
insoluble, due to there being multiple ways that the cross-mounted file 
systems might translate names; but if there are, can we learn something 
from the rules the mounting systems use, to be compatible with (one of) 
them, or not.


Together with your change to avoid using PUA characters, and the rule 
suggested by MRAB in another branch of this thread, of treating 
half-surrogates as invalid byte sequences may avoid the data puns I'm 
concerned about.


It is not clear how half-surrogate characters would be displayed, when 
the user prints or displays such a file name string.  It would seem that 
programs that display file names to users might still have issues with 
such; an escaping mechanism that uses displayable characters would have 
an advantage there.



--
Glenn -- http://nevcal.com/
===
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com