[Python-Dev] Two proposed changes to float formatting
I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: >>> '%f' % 2**166. '93536104789177786765035829293842113257979682750464.00' >>> '%f' % 2**167. '1.87072e+50' I propose removing this feature for 3.1 More details: The current behaviour is documented (standard library->builtin types). (Until very recently, it was actually misdocumented as changing at 1e25, not 1e50.) """For safety reasons, floating point precisions are clipped to 50; %f conversions for numbers whose absolute value is over 1e50 are replaced by %g conversions. [5] All other errors raise exceptions.""" There's even a footnote: """[5] These numbers are fairly arbitrary. They are intended to avoid printing endless strings of meaningless digits without hampering correct use and without having to know the exact precision of floating point values on a particular machine.""" I don't find this particularly convincing, though---I just don't see a really good reason not to give the user exactly what she/he asks for here. I have a suspicion that at least part of the motivation for the '%f' -> '%g' switch is that it means the implementation can use a fixed-size buffer. But Eric has fixed this (in 3.1, at least) and the buffer is now dynamically allocated, so this isn't a concern any more. Other reasons not to switch from '%f' to '%g' in this way: - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. - now that we're using David Gay's 'perfect rounding' code, we can be sure that the digits aren't entirely meaningless, or at least that they're the 'right' meaningless digits. This wasn't true before. - C doesn't do this, and the %f, %g, %e formats really owe their heritage to C. - float formatting is already quite complicated enough; no need to add to the mental complexity - removal simplifies the implementation :-) On to the second proposed change: (2) complex str and repr don't behave like float str and repr, in that the float version always adds a trailing '.0' (unless there's an exponent), but the complex version doesn't: >>> 4., 10. (4.0, 10.0) >>> 4. + 10.j (4+10j) I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". Mostly this is just about consistency, ease of implementation, and aesthetics. As far as I can tell, the extra '.0' in the float repr serves two closely-related purposes: it makes it clear to the human reader that the number is a float rather than an integer, and it makes sure that e.g., eval(repr(x)) recovers a float rather than an int. The latter point isn't a concern for the current complex repr, but the former is: 4+10j looks to me more like a Gaussian integer than a complex number. Any comments? Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote: > I'd like to propose two minor changes to float and complex > formatting, for 3.1. I don't think either change should prove > particularly disruptive. > > (1) Currently, '%f' formatting automatically changes to '%g' > formatting for numbers larger than 1e50. ... > I propose removing this feature for 3.1 No objections from me. +1 > I propose changing the complex str and repr to behave like the > float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" > rather than "(4+10j)". No objections here either. +0 -- Steven D'Aprano ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Steven D'Aprano wrote: On Sun, 26 Apr 2009 08:06:56 pm Mark Dickinson wrote: I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. ... I propose removing this feature for 3.1 No objections from me. +1 I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". No objections here either. +0 Doing it sooner rather than later means that it is less likely to disrupt anyone relying on the representation (i.e. doctests). Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Paul Moore writes: > 2009/4/24 Stephen J. Turnbull : > > Paul Moore writes: > > > > > The pros for Martin's proposal are a uniform cross-platform interface, > > > and a user-friendly API for the common case. > > > > A more accurate phrasing would be "... a user-friendly API for those > > who feel very lucky today." Which is the common case, of course, but > > spins a little differently. > > Sorry, but I think you're misrepresenting things. I'd have probably > let you off if you'd missed out the "very" - but I do think that it's > the common case. Consider: If you need reliability, then you can't get it this way. The reason "very" is (somewhat) justified is that this kind of issue is a little like unemployment. You hardly ever meet someone who's 7.2% unemployed, but you probably know several who are 100% unemployed. If you see a broken encoding once, you're likely to see it a million times (spammers have the most broken software) or maybe have it raise an unhandled Exception a dozen times (in rate of using busted software, the spammers are closely followed by bosses---which would be very bad, eh, if you 2/3 of the mail from your boss ends up in an undeliverables queue due to encoding errors that are unhandled by your some filter in your mail pipeline). > - Windows systems where broken Unicode (lone surrogates or whatever) > isn't involved > - Unix systems where the user's stated filesystem encoding is correct > Can you honestly say that this isn't the vast majority of real-world > environments? Again, that's not the point. The point is that six-sigma reliability world-wide is not going to be very comforting to the poor souls who happen to have broken software in their environment sending broken encodings regularly, because they're going to be dealing with one or two sigmas, and that's just not good enough in a production environment. > > If you didn't start with a valid string in a known encoding, you > > shouldn't treat it as characters because it's not. > > Again, that's the purist argument. If you have a string (of bytes, I > guess) and a 99% certain guess as to the correct encoding, then I'd > argue that, as long as (a) it's not mission-critical (lives or backups > depend on it) Assurance that you can even determine (a) is not provided by the PEP. There is no way to contain a problem if it should occur, because it's "just a string" and could go anywhere, and get converted back or otherwise manipulated in a context that doesn't know how to handle it (which might not even be Python if a C-level extension is involved). Given that Python has no internal mechanism for saying "in this area only valid Unicode will be accepted", it seems likely that mission critical software *will* interact with this feature, if only indirectly (or perhaps only in software originally intended for use in the U.S. only, but then it gets exported, etc). > and (b) you have a means of failing relatively > gracefully, you have every reason to make the assumption about > encoding. (b) is not provided in the PEP, either. We have no idea what the failure mode will be. > After all, what's the alternative? The alternative is to refuse to provide a simple standard way to decode unreliably, and in that way make the user reponsible for an explicit choice about what level and kinds of unreliability they will accept. I realize that's unpalatable to most people who use Python to develop software, and so I'm unwilling to go even -0 on the PEP. However, to give one example, I've been following Mailman development for about 10 years, and it is a dismal story despite a group of developers very sympathetic to encoding and multicultural issues. As recently as Mailman 2.10 (IIRC) there were *still* bugs in encoding handling that could stop the show (ie, not only did the buggy post not get processed, but the exception propagated high enough to cause everything behind it in the queue to fail, too). I think it would be sad if ten years from now there was software using this technique and failing occasionally. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Bug tracker down?
The bugs.python.org site seems to be down. ping gives me the following (from Ireland): Macintosh-4:py3k dickinsm$ ping bugs.python.org PING bugs.python.org (88.198.142.26): 56 data bytes 36 bytes from et.2.16.rs3k6.rz5.hetzner.de (213.239.244.101): Destination Host Unreachable Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 00 5400 77e1 0 3a 01 603d 192.168.1.2 88.198.142.26 Various others on #python-dev have confirmed that it's not working for them. Does anyone know what the problem is? Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug tracker down?
On Sun, Apr 26, 2009, Mark Dickinson wrote: > > The bugs.python.org site seems to be down. Dunno -- forwarded to the people who can do something about it. (There's a migration to a new mailserver going on, but I don't think this is related.) -- Aahz ([email protected]) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug tracker down?
On Sun, Apr 26, 2009 at 4:19 PM, Aahz wrote: > On Sun, Apr 26, 2009, Mark Dickinson wrote: >> >> The bugs.python.org site seems to be down. > > Dunno -- forwarded to the people who can do something about it. (There's > a migration to a new mailserver going on, but I don't think this is > related.) Thanks. Who should I contact next time, to avoid spamming python-dev? Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug tracker down?
On Sun, Apr 26, 2009, Mark Dickinson wrote: > On Sun, Apr 26, 2009 at 4:19 PM, Aahz wrote: >> On Sun, Apr 26, 2009, Mark Dickinson wrote: >>> >>> The bugs.python.org site seems to be down. >> >> Dunno -- forwarded to the people who can do something about it. ?(There's >> a migration to a new mailserver going on, but I don't think this is >> related.) > > Thanks. Who should I contact next time, to avoid spamming python-dev? python-dev isn't a bad place (because it alerts the core developers), but you can also send a message to [email protected] -- Aahz ([email protected]) <*> http://www.pythoncraft.com/ "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." --Red Adair ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: ... I propose removing this feature for 3.1 I'm +1 on this. I have a suspicion that at least part of the motivation for the '%f' -> '%g' switch is that it means the implementation can use a fixed-size buffer. But Eric has fixed this (in 3.1, at least) and the buffer is now dynamically allocated, so this isn't a concern any more. I agree that this is a big part of the reason it was done. There's still some work to be done in the fallback code which we use if we can't use Gay's implementation of _Py_dg_dtoa. But it's reasonably easy to calculate the maximum buffer size needed given the precision, for passing on to PyOS_snprintf. (At least I think that sentence is true, I'll very with Mark offline). Other reasons not to switch from '%f' to '%g' in this way: - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. This is the big reason for me. - float formatting is already quite complicated enough; no need to add to the mental complexity And this, too. (2) complex str and repr don't behave like float str and repr, in that the float version always adds a trailing '.0' (unless there's an exponent), but the complex version doesn't: ... I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm not sure about the spaces around the sign. If we do want the spaces there, we can get rid of Py_DTSF_SIGN, since that's the only place it's used and we won't be able to use it for complex going forward. Eric. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, Apr 26, 2009 at 5:59 PM, Eric Smith wrote: > Mark Dickinson wrote: >> I propose changing the complex str and repr to behave like the >> float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" >> rather than "(4+10j)". > > I'm +0.5 on this. I'd probably be +1 if I were a big complex user. Also, I'm > not sure about the spaces around the sign. If we do want the spaces there, Whoops. The spaces were a mistake: I'm not proposing to add those. I meant "(4.0+10.0j)" rather than "(4.0 + 10.0j)". Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Terry Reedy wrote: > Is NUL \0 allowed in POSIX file names? If not, could that be used as an > escape char. If it is not legal, then custom translated strings that > escape in the wild would raise a red flag as soon as something else > tried to use them. Per David Wheeler's excellent "Fixing Linux/Unix/POSIX Filenames"[1]: Traditionally, Unix/Linux/POSIX filenames can be almost any sequence of bytes, and their meaning is unassigned. The only real rules are that “/” is always the directory separator, and that filenames can’t contain byte 0 (because this is the terminator). [1] http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html Tres. - -- === Tres Seaver +1 540-429-0999 [email protected] Palladion Software "Excellence by Design"http://palladion.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ9KHg+gerLs4ltQ4RAs0HAKCiAOxmB8oBJRIoOIK+OK2LryUN6ACgp64k fzGUNScJwcdzzod3N+5JhOE= =Cw4m -END PGP SIGNATURE- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bug tracker down?
> Does anyone know what the problem is? The hardware running it apparently has serious problems. Upfronthosting, the company providing the hardware, is working on a solution. Unfortunately, it is difficult to get support from the datacenter on weekends. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: ... """[5] These numbers are fairly arbitrary. They are intended to avoid printing endless strings of meaningless digits without hampering correct use and without having to know the exact precision of floating point values on a particular machine.""" I don't find this particularly convincing, though---I just don't see a really good reason not to give the user exactly what she/he asks for here. As a user of Idle, I would not like to see the change you seek of having %f stay full-precision. When a number gets too long to print on a single line, the wrap depends on the current window width, and is calculated dynamically. One section of the display with a 8000 -digit (100-line) text makes Idle slow to scroll around in. It is too easy for numbers to go massively positive in a bug. - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. > - now that we're using David Gay's 'perfect rounding' >code, we can be sure that the digits aren't entirely >meaningless, or at least that they're the 'right' meaningless >digits. This wasn't true before. However, this is, I agree, a problem. Since all of these numbers should end in a massive number of zeroes, how about we replace only the trailing zeroes with the e, so we wind up with: 1157920892373161954235709850086879078532699846656405640e+23 or 115792089237316195423570985008687907853269984665640564.0e+24 or some such, rather than 1.157920892373162e+77 or 1.15792089237316195423570985008687907853269984665640564e+77 --Scott David Daniels [email protected] ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels wrote: > As a user of Idle, I would not like to see the change you seek of > having %f stay full-precision. When a number gets too long to print > on a single line, the wrap depends on the current window width, and > is calculated dynamically. One section of the display with a 8000 > -digit (100-line) text makes Idle slow to scroll around in. It is > too easy for numbers to go massively positive in a bug. I see your point. Since we're talking about floats, thought, there should never be more than 316 characters in a '%f' % x: the largest float is around 1.8e308, giving 308 digits before the point, 6 after, a decimal point, and possibly a minus sign. (Assuming that your platform uses IEEE 754 doubles.) > However, this is, I agree, a problem. Since all of these numbers > should end in a massive number of zeroes But they typically don't end in zeros (except the six zeros following the point), because they're stored in binary rather than decimal. For example: >>> int(1e308) 11097906362944045541740492309677311846336810682903157585404911491537163328978494688899061249669721172515611590283743140088328307009198146046031271664502933027185697489699588559043338384466165001178426897626212945177628091195786707458122783970171784415105291802893207873272974885715430223118336 Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] float formatting
Floating point printing is tricky, as I am sure you know. You might want to refrefresh your understanding by consulting the literture--I know I would. For example, you might want to look at http://portal.acm.org/citation.cfm?id=93559 Guy Steele's paper: Guy L. Steele , Jon L. White, How to print floating-point numbers accurately, ACM SIGPLAN Notices, v.39 n.4, April 2004 is a classic and worthy of a read. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: I'd like to propose two minor changes to float and complex formatting, for 3.1. I don't think either change should prove particularly disruptive. (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: '%f' % 2**166. '93536104789177786765035829293842113257979682750464.00' '%f' % 2**167. '1.87072e+50' I propose removing this feature for 3.1 More details: The current behaviour is documented (standard library->builtin types). (Until very recently, it was actually misdocumented as changing at 1e25, not 1e50.) """For safety reasons, floating point precisions are clipped to 50; %f conversions for numbers whose absolute value is over 1e50 are replaced by %g conversions. [5] All other errors raise exceptions.""" There's even a footnote: """[5] These numbers are fairly arbitrary. They are intended to avoid printing endless strings of meaningless digits without hampering correct use and without having to know the exact precision of floating point values on a particular machine.""" I don't find this particularly convincing, though---I just don't see a really good reason not to give the user exactly what she/he asks for here. I have a suspicion that at least part of the motivation for the '%f' -> '%g' switch is that it means the implementation can use a fixed-size buffer. But Eric has fixed this (in 3.1, at least) and the buffer is now dynamically allocated, so this isn't a concern any more. Other reasons not to switch from '%f' to '%g' in this way: - the change isn't gentle: as you go over the 1e50 boundary, the number of significant digits produced suddenly changes from 56 to 6; Looking at your example, that jumped out at me as somewhat startling... it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. So I agree with this, even if the default # of sig digits were less. +1 - now that we're using David Gay's 'perfect rounding' code, we can be sure that the digits aren't entirely meaningless, or at least that they're the 'right' meaningless digits. This wasn't true before. - C doesn't do this, and the %f, %g, %e formats really owe their heritage to C. - float formatting is already quite complicated enough; no need to add to the mental complexity - removal simplifies the implementation :-) On to the second proposed change: (2) complex str and repr don't behave like float str and repr, in that the float version always adds a trailing '.0' (unless there's an exponent), but the complex version doesn't: 4., 10. (4.0, 10.0) 4. + 10.j (4+10j) I propose changing the complex str and repr to behave like the float version. That is, repr(4. + 10.j) should be "(4.0 + 10.0j)" rather than "(4+10j)". Mostly this is just about consistency, ease of implementation, and aesthetics. As far as I can tell, the extra '.0' in the float repr serves two closely-related purposes: it makes it clear to the human reader that the number is a float rather than an integer, and it makes sure that e.g., eval(repr(x)) recovers a float rather than an int. The latter point isn't a concern for the current complex repr, but the former is: 4+10j looks to me more like a Gaussian integer than a complex number. I agree. A complex is alternately an ordered pair of floats. A different, number-theory oriented implementation of Python might even want to read 4+10j as a G. i. tjr ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: > On Sun, Apr 26, 2009 at 8:11 PM, Scott David Daniels > wrote: >> As a user of Idle, I would not like to see the change you seek of >> having %f stay full-precision. When a number gets too long to print >> on a single line, the wrap depends on the current window width, and >> is calculated dynamically. One section of the display with a 8000 >> -digit (100-line) text makes Idle slow to scroll around in. It is >> too easy for numbers to go massively positive in a bug. > I had also said (without explaining: > > only the trailing zeroes with the e, so we wind up with: > > 1157920892373161954235709850086879078532699846656405640e+23 > > or 115792089237316195423570985008687907853269984665640564.0e+24 > > or some such, rather than > > 1.157920892373162e+77 > > or 1.15792089237316195423570985008687907853269984665640564e+77 These are all possible representations for 2 ** 256. > I see your point. Since we're talking about floats, thought, there > should never be more than 316 characters in a '%f' % x: the > largest float is around 1.8e308, giving 308 digits before the > point, 6 after, a decimal point, and possibly a minus sign. > (Assuming that your platform uses IEEE 754 doubles.) You are correct that I had not thought long and hard about that. 308 is livable, if not desireable. I was remebering accidentally displaying the result of a factorial call. >> However, this is, I agree, a problem. Since all of these numbers >> should end in a massive number of zeroes > > But they typically don't end in zeros (except the six zeros following > the point), > because they're stored in binary rather than decimal _but_ the printed decimal number I am proposing is within one ULP of the value of the binary numbery. That is, the majority of the digits in int(1e308) are a fiction -- they could just as well be the digits of int(1e308) + int(1e100) because 1e308 + 1e100 == 1e308 That is the sense in which I say those digits in decimal are zeroes. My proposal was to have the integer part of the expansion be a representation of the accuracy of the number in a visible form. I chose the value I chose since a zero lies at the very end, and tried to indicate I did not really care where trailing actual accuracy zeros get taken off the representation. The reason I don't care is that the code from getting a floating point value is tricky, and I suspect the printing code might not easily be able to distinguish between a significant trailing zero and fictitous bits. --Scott David Daniels [email protected] ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote: > I had also said (without explaining: >> > only the trailing zeroes with the e, so we wind up with: >> > 1157920892373161954235709850086879078532699846656405640e+23 >> > or 115792089237316195423570985008687907853269984665640564.0e+24 >> > or some such, rather than >> > 1.157920892373162e+77 >> > or 1.15792089237316195423570985008687907853269984665640564e+77 > These are all possible representations for 2 ** 256. Understood. > _but_ the printed decimal number I am proposing is within one ULP of > the value of the binary numbery. But there are plenty of ways to get this if this is what you want: if you want a displayed result that's within 1 ulp (or 0.5 ulps, which would be better) of the true value then repr should serve your needs. If you want more control over the number of significant digits then '%g' formatting gives that, together with a nice-looking output for small numbers. It's only '%f' formatting that I'm proposing changing: I see a '%.2f' formatting request as a very specific, precise one: give me exactly 2 digits after the point---no more, no less, and it seems wrong and arbitrary that this request should be ignored for numbers larger than 1e50 in absolute value. That is, for general float formatting needs, use %g, str and repr. %e and %f are for when you want fine control. > That is, the majority of the digits > in int(1e308) are a fiction Not really: the float that Python stores has a very specific value, and the '%f' formatting is showing exactly that value. (Yes, I know that some people advocate viewing a float as a range of values rather than a specific value; but I'm pretty sure that that's not the way that the creators of IEEE 754 were thinking.) > zeros get taken off the representation. The reason I don't care is > that the code from getting a floating point value is tricky, and I > suspect the printing code might not easily be able to distinguish > between a significant trailing zero and fictitous bits. As of 3.1, the printing code should be fine: it's using David Gay's 'perfect rounding' code, so what's displayed should be correctly rounded to the requested precision. Mark ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
it would make more sense to me if it stayed fixed at 56 sig digits for numbers larger than 1e50. So I agree with this, even if the default # of sig digits were less. Several reasons to accept Mark's proposal: * It matches what C does and many languages tend to copy the C standards with respect to format codes. Matching other languages helps in porting code, copying algorithms, and mentally switching back and forth when working in multiple languages. * When a programmer has chosen %f, that means that they have consciously rejected choosing %e or %g. It is generally best to have the code do what the programmer asked for ;-) * Code that tested well with 1e47, 1e48, 1e49, and 1e50 suddenly shifts behavior with 1e51. Behavior shifts like that are bug bait. * The 56 significant digits may be rooted in the longest decimal expansion of a 53 bit float. For example, len(str(Decimal.from_float(.1))) is 57 including the leading zero. But not all machines (now, in the past, or in the future) use 53 bits for the significand. * Use of exponents is common but not universal. Some converters for SQL specs like Decimal(10,80) may not recognize the e-notation. The xmlrpc spec only accepts decimal expansions not %e notation. * The programmer needs to have some way to spell-out a decimal expansion when needed. Currently, %f is the only way. Raymond ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Two proposed changes to float formatting
Mark Dickinson wrote: (1) Currently, '%f' formatting automatically changes to '%g' formatting for numbers larger than 1e50. For example: '%f' % 2**166. '93536104789177786765035829293842113257979682750464.00' '%f' % 2**167. '1.87072e+50' I propose removing this feature for 3.1 I don't think we've stated it on this discussion, but I know from private email with Mark that his proposal is for both %-formatting and for float.__format__ to have this change. I just want to get it on the record here. Eric. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
How about another str-like type, a sequence of char-or-bytes? Could be
called strbytes or stringwithinvalidcharacters. It would support
whatever subset of str functionality makes sense / is easy to
implement plus a to_escaped_str() method (that does the escaping the
PEP talks about) for people who want to use regexes or other str-only
stuff.
Here is a description by example:
os.listdir('.') -> [strbytes('normal_file'), strbytes('bad', 128, 'file')]
strbytes('a')[0] -> strbytes('a')
strbytes('bad', 128, 'file')[3] -> strbytes(128)
strbytes('bad', 128, 'file').to_escaped_str() -> 'bad?128file'
Having a separate type is cleaner than a "str that isn't exactly what
it represents". And making the escaping an explicit (but
rarely-needed) step would be less surprising for users. Anyway, I
don't know a whole lot about this issue so there may an obvious reason
this is a bad idea.
On Wed, Apr 22, 2009 at 6:50 AM, "Martin v. Löwis" wrote:
> I'm proposing the following PEP for inclusion into Python 3.1.
> Please comment.
>
> Regards,
> Martin
>
> PEP: 383
> Title: Non-decodable Bytes in System Character Interfaces
> Version: $Revision: 71793 $
> Last-Modified: $Date: 2009-04-22 08:42:06 +0200 (Mi, 22. Apr 2009) $
> Author: Martin v. Löwis
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 22-Apr-2009
> Python-Version: 3.1
> Post-History:
>
> Abstract
>
>
> File names, environment variables, and command line arguments are
> defined as being character data in POSIX; the C APIs however allow
> passing arbitrary bytes - whether these conform to a certain encoding
> or not. This PEP proposes a means of dealing with such irregularities
> by embedding the bytes in character strings in such a way that allows
> recreation of the original byte string.
>
> Rationale
> =
>
> The C char type is a data type that is commonly used to represent both
> character data and bytes. Certain POSIX interfaces are specified and
> widely understood as operating on character data, however, the system
> call interfaces make no assumption on the encoding of these data, and
> pass them on as-is. With Python 3, character strings use a
> Unicode-based internal representation, making it difficult to ignore
> the encoding of byte strings in the same way that the C interfaces can
> ignore the encoding.
>
> On the other hand, Microsoft Windows NT has correct the original
> design limitation of Unix, and made it explicit in its system
> interfaces that these data (file names, environment variables, command
> line arguments) are indeed character data, by providing a
> Unicode-based API (keeping a C-char-based one for backwards
> compatibility).
>
> For Python 3, one proposed solution is to provide two sets of APIs: a
> byte-oriented one, and a character-oriented one, where the
> character-oriented one would be limited to not being able to represent
> all data accurately. Unfortunately, for Windows, the situation would
> be exactly the opposite: the byte-oriented interface cannot represent
> all data; only the character-oriented API can. As a consequence,
> libraries and applications that want to support all user data in a
> cross-platform manner have to accept mish-mash of bytes and characters
> exactly in the way that caused endless troubles for Python 2.x.
>
> With this PEP, a uniform treatment of these data as characters becomes
> possible. The uniformity is achieved by using specific encoding
> algorithms, meaning that the data can be converted back to bytes on
> POSIX systems only if the same encoding is used.
>
> Specification
> =
>
> On Windows, Python uses the wide character APIs to access
> character-oriented APIs, allowing direct conversion of the
> environmental data to Python str objects.
>
> On POSIX systems, Python currently applies the locale's encoding to
> convert the byte data to Unicode. If the locale's encoding is UTF-8,
> it can represent the full set of Unicode characters, otherwise, only a
> subset is representable. In the latter case, using private-use
> characters to represent these bytes would be an option. For UTF-8,
> doing so would create an ambiguity, as the private-use characters may
> regularly occur in the input also.
>
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.
>
> The error handler interface is extended to allow the encode error
> handler to return byte strings immediately, in addition to returning
> Unicode strings which then get encoded again.
>
> If the locale's encoding is UTF-8, the file system encoding is set to
> a new encoding "utf-8b". The UTF-8b codec decodes non-decodable bytes
> (which must be >= 0x80) into half surrogate codes U+DC80..U+DCFF.
>
> Discussion
> ==
>
> While providing a uniform API to non-decodable bytes,
Re: [Python-Dev] Two proposed changes to float formatting
ark Dickinson wrote: > On Sun, Apr 26, 2009 at 10:42 PM, Scott David Daniels wrote: >... >> I had also said (without explaining: only the trailing zeroes with the e, so we wind up with: 1157920892373161954235709850086879078532699846656405640e+23 or 115792089237316195423570985008687907853269984665640564.0e+24 or some such, rather than 1.157920892373162e+77 or 1.15792089237316195423570985008687907853269984665640564e+77 >> These are all possible representations for 2 ** 256. > > Understood. > >> _but_ the printed decimal number I am proposing is within one ULP of >> the value of the binary numbery. > > But there are plenty of ways to get this if this is what you want: if > you want a displayed result that's within 1 ulp (or 0.5 ulps, which > would be better) of the true value then repr should serve your needs. The representation I am suggesting here is a half-way measure between your proposal and the existing behvior. This representation addresses the abrupt transition that you point out (number of significant digits drops precipitously) without particularly changing the goal of the transition (displaying faux accuracy), without, in my (possibly naive) view, seriously complicating either the print-generating code or the issues for the reader of the output. To wit, the proposal is (A) for numbers where the printed digits exceed the accuracy presented, represent the result as an integer with an e+N, rather than a number between 1 and 2-epsilon with an exponent that makes you have to count digits to compare the two values, and (B) that the full precision available in the the value be shown in the representation. Given that everyone understands that is what I am proposing, I am OK with the decision going where it will. I am comforted that we are only talking about about four wrapped lines if we go to the full integer, which I had not realized. Further, I agree with you that there is an abrupt transition in represented accuracy as we cross from %f to %g, that should be somehow addressed. You want to address it by continuing to show digits, and I want to limit the digits shown to a value that reflects the known accuracy. I also want text that compares "smoothly" with numbers near the transition (so that greater-than and less-than relationships are obvious without thinking, hence the representation that avoids the "normalized" mantissa. . Having said all this, I think my compromise position should be clear. I did not mean to argue with you, but rather intended to propose a possible middle way that some might find appealing. --Scott David Daniels [email protected] ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
> How about another str-like type, a sequence of char-or-bytes? That would be a different PEP. I personally like my own proposal more, but feel free to propose something different. Regards, Martin ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
On approximately 4/25/2009 5:35 AM, came the following characters from the keyboard of Martin v. Löwis: Because the encoding is not reliably reversible. Why do you say that? The encoding is completely reversible (unless we disagree on what "reversible" means). I'm +1 on the concept, -1 on the PEP, due solely to the lack of a reversible encoding. Then please provide an example for a setup where it is not reversible. Regards, Martin It is reversible if you know that it is decoded, and apply the encoding. But if you don't know that has been encoded, then applying the reverse transform can convert an undecoded str that matches the decoded str to the form that it could have, but never did take. The problem is that there is no guarantee that the str interface provides only strictly conforming Unicode, so decoding bytes to non-strictly conforming Unicode, can result in a data pun between non-strictly conforming Unicode coming from the str interface vs bytes being decoded to non-strictly conforming Unicode coming from the bytes interface. Any particular problem that always consistently uses one or the other (bytes vs str) APIs under the covers might never be affected by such a data pun, but programs that may use both types of interface could potentially see a data pun. If your PEP depends on consistent use of one or the other type of interface, you should say so, and if the platform only provides that type of interface, maybe all is well. Both types of interfaces are available on Windows, perhaps POSIX only provides native bytes interfaces, and if the PEP is the only way to provide str interfaces, then perhaps consistency use is required. There are still issues regarding how Windows and POSIX programs that are sharing cross-mounted file systems might communicate file names between each other, which is not at all clear from the PEP. If this is an insoluble or un-addressed issue, it should be stated. (It is probably insoluble, due to there being multiple ways that the cross-mounted file systems might translate names; but if there are, can we learn something from the rules the mounting systems use, to be compatible with (one of) them, or not. Together with your change to avoid using PUA characters, and the rule suggested by MRAB in another branch of this thread, of treating half-surrogates as invalid byte sequences may avoid the data puns I'm concerned about. It is not clear how half-surrogate characters would be displayed, when the user prints or displays such a file name string. It would seem that programs that display file names to users might still have issues with such; an escaping mechanism that uses displayable characters would have an advantage there. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
