[issue3028] tokenize module: normal lines, not "logical"
New submission from Noam Raphael <[EMAIL PROTECTED]>: Hello, The documentation of the tokenize module says: "The line passed is the *logical* line; continuation lines are included." Some background: The tokenize module splits a python source into tokens, and says for each token where it begins and where it ends, in the format of (row, offset). This note in the documentation made me think that continuation lines are considered as one line, and made me break my head how I should find the offset of the token in the original string. The truth is that the row number is simply the index of the line as returned by the readline function, and it's very simple to reconstruct the string offset. I suggest that this will be changed to something like "The line passed is the index of the string returned by the readline function, plus 1. That is, the first string returned is called line 1, the second is called line 2, and so on." Thanks, Noam -- assignee: georg.brandl components: Documentation messages: 67635 nosy: georg.brandl, noam severity: normal status: open title: tokenize module: normal lines, not "logical" versions: Python 2.5 ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3028> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3028] tokenize module: normal lines, not "logical"
Noam Raphael <[EMAIL PROTECTED]> added the comment: Can I suggest that you also add something like "The row indices in the (row, column) tuples, however, are physical, and don't treat continuation lines specially."? It's just that it took me some time to understand your clarification, since the row indices I thought the documentation talks about are also tuple items, they just happen to be the first in the tuple, not the last. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3028> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8048] doctest assumes sys.displayhook hasn't been touched
New submission from Noam Raphael : Hello, This bug is the cause of a bug reported about DreamPie: https://bugs.launchpad.net/bugs/530969 DreamPie (http://dreampie.sourceforge.net) changes sys.displayhook so that values will be sent to the parent process instead of being printed in stdout. This causes doctest to fail when run from DreamPie, because it implicitly assumes that sys.displayhook writes the values it gets to sys.stdout. This is why doctest replaces sys.stdout with its own file-like object, which is ready to receive the printed values. The solution is simply to replace sys.displayhook with a function that will do the expected thing, just like sys.stdout is replaced. The patch I attach does exactly this. Thanks, Noam -- components: Library (Lib) files: doctest.py.diff keywords: patch messages: 100334 nosy: noam severity: normal status: open title: doctest assumes sys.displayhook hasn't been touched type: behavior Added file: http://bugs.python.org/file16421/doctest.py.diff ___ Python tracker <http://bugs.python.org/issue8048> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: I don't know, for me it works fine, even after downloading a fresh SVN copy. On what platform does it happen? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: I also use linux on x86. I think that byte order would cause different results (the repr of a random float shouldn't be "1.0".) Does the test case run ok? Because if it does, it's really strange. -- versions: -Python 2.6 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: Oh, this is sad. Now I know why Tcl have implemented also a decimal to binary routine. Perhaps we can simply use both their routines? If I am not mistaken, their only real dependency is on a library which allows arbitrary long integers, called tommath, from which they use a few basic functions. We can use instead the functions from longobject.c. It will probably be somewhat slower, since longobject.c wasn't created to allow in-place operations, but I don't think it should be that bad -- we are mostly talking about compile time. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: The Tcl code can be fonund here: http://tcl.cvs.sourceforge.net/tcl/tcl/generic/tclStrToD.c?view=markup What Tim says gives another reason for using that code - it means that currently, the compilation of the same source code on two platforms can result in a code which does different things. Just to make sure - IEEE does require that operations on doubles will do the same thing on different platforms, right? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: I think that for str(), the current method is better - using the new repr() method will make str(1.1*3) == '3.3003', instead of '3.3'. (The repr is right - you can check, and 1.1*3 != 3.3. But for str() purposes it's fine.) But I actually think that we should also use Tcl's decimal to binary conversion - otherwise, a .pyc file created by python compiled with Microsoft will cause a different behaviour from a .pyc file created by python compiled with Gnu, which is quite strange. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: If I think about it some more, why not get rid of all the float platform-dependencies and define how +inf, -inf and nan behave? I think that it means: * inf and -inf are legitimate floats just like any other float. Perhaps there should be a builtin Inf, or at least math.inf. * nan is an object of type float, which behaves like None, that is: "nan == nan" is true, but "nan < nan" and "nan < 3" will raise an exception. Mathematical operations which used to return nan will raise an exception (division by zero does this already, but "inf + -inf" will do that too, instead of returning nan.) Again, there should be a builtin NaN, or math.nan. The reason for having a special nan object is compatibility with IEEE floats - I want to be able to pass around IEEE floats easily even if they happen to be nan. This is basically what Tcl did, if I understand correctly - see item 6 in http://www.tcl.tk/cgi-bin/tct/tip/132.html . __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: That's right, but the standard also defines that 0.0/0 -> nan, and 1.0/0 -> inf, but instead we raise an exception. It's just that in Python, every object is expected to be equal to itself. Otherwise, how can I check if a number is nan? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: If I understand correctly, there are two main concerns: speed and portability. I think that they are both not that terrible. How about this: * For IEEE-754 hardware, we implement decimal/binary conversions, and define the exact behaviour of floats. * For non-IEEE-754 hardware, we keep the current method of relying on the system libraries. About speed, perhaps it's not such a big problem, since decimal/binary conversions are usually related to I/O, and this is relatively slow anyway. I think that usually a program does a relatively few decimal/binary conversions. About portability, I think (from a small research I just made) that S90 supports IEEE-754. This leaves VAX and cray users, which will have to live with a non-perfect floating-point behaviour. If I am correct, it will let 99.9% of the users get a deterministic floating-point behaviour, where eval(repr(f)) == f and repr(1.1)=='1.1', with a speed penalty they won't notice. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: If I were in that situation I would prefer to store the binary representation. But if someone really needs to store decimal floats, we can add a method "fast_repr" which always calculates 17 decimal digits. Decimal to binary conversion, in any case, shouldn't be slower than it is now, since on Gnu it is done anyway, and I don't think that our implementation should be much slower. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: Ok, so if I understand correctly, the ideal thing would be to implement decimal to binary conversion by ourselves. This would make str <-> float conversion do the same thing on all platforms, and would make repr(1.1)=='1.1'. This would also allow us to define exactly how floats operate, with regard to infinities and NaNs. All this is for IEEE-754 platforms -- for the rare platforms which don't support it, the current state remains. However, I don't think I'm going, in the near future, to add a decimal to binary implementation -- the Tcl code looks very nice, but it's quite complicated and I don't want to fiddle with it right now. If nobody is going to implement the correctly rounding decimal to binary conversion, then I see three options: 1. Revert to previous situation 2. Keep the binary to shortest decimal routine and use it only when we know that the system's decimal to binary routine is correctly rounding (we can check - perhaps Microsoft has changed theirs?) 3. Keep the binary to shortest decimal routine and drop repr(f) == f (I don't like that option). If options 2 or 3 are chosen, we can check the 1e5 bug. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: 2007/12/13, Guido van Rossum <[EMAIL PROTECTED]>: > > > Ok, so if I understand correctly, the ideal thing would be to > > implement decimal to binary conversion by ourselves. This would make > > str <-> float conversion do the same thing on all platforms, and would > > make repr(1.1)=='1.1'. This would also allow us to define exactly how > > floats operate, with regard to infinities and NaNs. All this is for > > IEEE-754 platforms -- for the rare platforms which don't support it, > > the current state remains. > > Does doubledigits.c not work for non-754 platforms? No. It may be a kind of an oops, but currently it just won't compile on platforms which it doesn't recognize, and it only recognizes 754 platforms. > > > 2. Keep the binary to shortest decimal routine and use it only when we > > know that the system's decimal to binary routine is correctly rounding > > (we can check - perhaps Microsoft has changed theirs?) > > Tim says you can't check (test) for this -- you have to prove it from > source, or trust the vendor's documentation. I would have no idea > where to find this documented. > The program for testing floating point compatibility is in http://www.cant.ua.ac.be/ieeecc754.html To run it, on my computer, I used: ./configure -target Conversions -platform IntelPentium_cpp make ./IeeeCC754 -d -r n -n x Conversion/testsets/d2bconvd less ieee.log This tests only doubles, round to nearest, and ignores flags which should be raised to signal inexact conversion. You can use any file in Conversions/testsets/d2b* - I chose this one pretty randomly. It turns out that even on my gcc 4.1.3 it finds a few floats not correctly rounded. :( Anyway, it can be used to test other platforms. If not by the executable itself, we can pretty easily write a python program which uses the test data. I don't know what exactly the errors with gcc 4.1.3 mean - is there a problem with the algorithm of glibc, or perhaps the testing program didn't set some flag? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: Ok, I think I have a solution! We don't really need always the shortest decimal representation. We just want that for most floats which have a nice decimal representation, that representation will be used. Why not do something like that: def newrepr(f): r = str(f) if eval(r) == f: return r else: return repr(f) Or, in more words: 1. Calculate the decimal representation of f with 17 precision digits, s1, using the system's routines. 2. Create a new string, s2, by rounding the resulting string to 12 precision digits. 3. Convert the resulting rounded string to a new double, g, using the system's routines. 4. If f==g, return s2. Otherwise, return s1. It will take some more time than the current repr(), because of the additional decimal to binary conversion, but we already said that if speed is extremely important one can use "'%f.17' % f". It will obviously preserve the eval(repr(f)) == f property. And it will return a short representation for almost any float that has a short representation. This algorithm I will be glad to implement. What do you think? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: I think that we can give up float(repr(x)) == x across different platforms, since we don't guarantee something more basic: We don't guarantee that the same program doing only floating point operations will produce the same results across different 754 platforms, because in the compilation process we rely on the system's decimal to binary conversion. In other words, using the current repr(), one can pass a value x from platform A platform B and be sure to get the same value. But if he has a python function f, he can't be sure that f(x) on platform A will result in the same value as f(x) on platform B. So the cross-platform repr() doesn't really matter. I like eval(repr(x)) == x because it means that repr(x) captures all the information about x, not because it lets me pass x from one platform to another. For communication, I use other methods. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: 2007/12/18, Raymond Hettinger <[EMAIL PROTECTED]>: > The 17 digit representation is useful in that it suggests where the > problem lies. In contrast, showing two numbers with reprs of different > lengths will strongly suggest that the shorter one is exactly > represented. Currently, that is a useful suggestion, 10.25 shows as > 10.25 while 10.21 shows as 10.211 (indicating that the > latter is not exactly represented). If you start showing 1.1 as 1.1, > then you've lost both benefits. Currently, repr(1.3) == '1.3', suggesting that it is exactly represented, which isn't true. I think that unless you use an algorithm that will truncate zeros only if the decimal representation is exact, the suggested algorithm is less confusing than the current one, in that it doesn't suggest that 1.3 is exactly stored and 1.1 isn't. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: About the educational problem. If someone is puzzled by "1.1*3 != 3.3", you could always use '%50f' % 1.1 instead of repr(1.1). I don't think that trying to teach people that floating points don't always do what they expect them to do is a good reason to print uninteresting and visually distracting digits when you don't have to. About the compatibility problem: I don't see why it should matter to the NumPy people if the repr() of some floats is made shorter. Anyway, we can ask them, using a PEP or just the mailing list. About the benefit: If I have data which contains floats, I'm usually interested about their (physical) value, not about their last bits. That's why str(f) does what it does. I like repr(x) to be one-to-one, as I explained in the previous message, but if it can be made more readable, why not make it so? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1580> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue979658] Improve HTML documentation of a directory
Noam Raphael added the comment: I just wanted to say that I'm not going to bother too much with this right now - Personally I will just use epydoc when I want to create an HTML documentation. Of course, you can still do whatever you like with the patch. Good luck, Noam -- nosy: +noam Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue979658> ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7260] SyntaxError with a not-existing offset for unicode code
New submission from Noam Raphael : Hello, This is from the current svn: > ./python Python 3.2a0 (py3k:76104, Nov 4 2009, 08:49:44) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> try: ... eval("u'שלום'") ... except SyntaxError as e: ... e ... SyntaxError('invalid syntax', ('', 1, 11, "u'שלום'")) As you can see, the offset (11) refers to a non-existing character, as the code contains only 7 characters. Thanks, Noam -- components: Interpreter Core messages: 94879 nosy: noam severity: normal status: open title: SyntaxError with a not-existing offset for unicode code versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue7260> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: I'm sorry, but it seems to me that the conclusion of the discussion in 2008 is that the algorithm should simply use the system's binary-to-decimal routine, and if the result is like 123.456, round it to 15 digits after the 0, check if the result evaluates to the original value, and if so, return the rounded result. This would satisfy most people, and has no need for complex rounding algorithms. Am I mistaken? If I implement this, will anybody be interested? Noam ___ Python tracker <http://bugs.python.org/issue1580> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1580] Use shorter float repr when possible
Noam Raphael added the comment: Do you mean msg58966? I'm sorry, I still don't understand what's the problem with returning f_15(x) if eval(f_15(x)) == x and otherwise returning f_17(x). You said (msg69232) that you don't care if float(repr(x)) == x isn't cross-platform. Obviously, the simple method will preserve eval(repr(x)) == x, no matter what rounding bugs are present on the platform. ___ Python tracker <http://bugs.python.org/issue1580> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com