[issue3028] tokenize module: normal lines, not "logical"

2008-06-02 Thread Noam Raphael

New submission from Noam Raphael <[EMAIL PROTECTED]>:

Hello,

The documentation of the tokenize module says: "The line passed is the
*logical* line; continuation lines are included."

Some background: The tokenize module splits a python source into tokens,
and says for each token where it begins and where it ends, in the format
of (row, offset). This note in the documentation made me think that
continuation lines are considered as one line, and made me break my head
how I should find the offset of the token in the original string. The
truth is that the row number is simply the index of the line as returned
by the readline function, and it's very simple to reconstruct the string
offset.

I suggest that this will be changed to something like "The line passed
is the index of the string returned by the readline function, plus 1.
That is, the first string returned is called line 1, the second is
called line 2, and so on."

Thanks,
Noam

--
assignee: georg.brandl
components: Documentation
messages: 67635
nosy: georg.brandl, noam
severity: normal
status: open
title: tokenize module: normal lines, not "logical"
versions: Python 2.5

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3028>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3028] tokenize module: normal lines, not "logical"

2008-06-08 Thread Noam Raphael

Noam Raphael <[EMAIL PROTECTED]> added the comment:

Can I suggest that you also add something like "The row indices in the
(row, column) tuples, however, are physical, and don't treat
continuation lines specially."?

It's just that it took me some time to understand your clarification,
since the row indices I thought the documentation talks about are also
tuple items, they just happen to be the first in the tuple, not the last.

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3028>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8048] doctest assumes sys.displayhook hasn't been touched

2010-03-03 Thread Noam Raphael

New submission from Noam Raphael :

Hello,

This bug is the cause of a bug reported about DreamPie: 
https://bugs.launchpad.net/bugs/530969

DreamPie (http://dreampie.sourceforge.net) changes sys.displayhook so that 
values will be sent to the parent process instead of being printed in stdout. 
This causes doctest to fail when run from DreamPie, because it implicitly 
assumes that sys.displayhook writes the values it gets to sys.stdout. This is 
why doctest replaces sys.stdout with its own file-like object, which is ready 
to receive the printed values.

The solution is simply to replace sys.displayhook with a function that will do 
the expected thing, just like sys.stdout is replaced. The patch I attach does 
exactly this.

Thanks,
Noam

--
components: Library (Lib)
files: doctest.py.diff
keywords: patch
messages: 100334
nosy: noam
severity: normal
status: open
title: doctest assumes sys.displayhook hasn't been touched
type: behavior
Added file: http://bugs.python.org/file16421/doctest.py.diff

___
Python tracker 
<http://bugs.python.org/issue8048>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-10 Thread Noam Raphael

Noam Raphael added the comment:

I don't know, for me it works fine, even after downloading a fresh SVN
copy. On what platform does it happen?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-10 Thread Noam Raphael

Noam Raphael added the comment:

I also use linux on x86. I think that byte order would cause different
results (the repr of a random float shouldn't be "1.0".)
Does the test case run ok? Because if it does, it's really strange.

--
versions:  -Python 2.6

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-10 Thread Noam Raphael

Noam Raphael added the comment:

Oh, this is sad. Now I know why Tcl have implemented also a decimal to
binary routine.

Perhaps we can simply use both their routines? If I am not mistaken,
their only real dependency is on a library which allows arbitrary long
integers, called tommath, from which they use a few basic functions.
We can use instead the functions from longobject.c. It will probably
be somewhat slower, since longobject.c wasn't created to allow
in-place operations, but I don't think it should be that bad -- we are
mostly talking about compile time.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-11 Thread Noam Raphael

Noam Raphael added the comment:

The Tcl code can be fonund here:
http://tcl.cvs.sourceforge.net/tcl/tcl/generic/tclStrToD.c?view=markup

What Tim says gives another reason for using that code - it means that
currently, the compilation of the same source code on two platforms can
result in a code which does different things.

Just to make sure - IEEE does require that operations on doubles will do
the same thing on different platforms, right?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-11 Thread Noam Raphael

Noam Raphael added the comment:

I think that for str(), the current method is better - using the new
repr() method will make str(1.1*3) == '3.3003', instead of
'3.3'. (The repr is right - you can check, and 1.1*3 != 3.3. But for
str() purposes it's fine.)

But I actually think that we should also use Tcl's decimal to binary
conversion - otherwise, a .pyc file created by python compiled with
Microsoft will cause a different behaviour from a .pyc file created by
python compiled with Gnu, which is quite strange.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-11 Thread Noam Raphael

Noam Raphael added the comment:

If I think about it some more, why not get rid of all the float
platform-dependencies and define how +inf, -inf and nan behave?

I think that it means:
* inf and -inf are legitimate floats just like any other float.
Perhaps there should be a builtin Inf, or at least math.inf.
* nan is an object of type float, which behaves like None, that is:
"nan == nan" is true, but "nan < nan" and "nan < 3" will raise an
exception. Mathematical operations which used to return nan will raise
an exception (division by zero does this already, but "inf + -inf"
will do that too, instead of returning nan.) Again, there should be a
builtin NaN, or math.nan. The reason for having a special nan object
is compatibility with IEEE floats - I want to be able to pass around
IEEE floats easily even if they happen to be nan.

This is basically what Tcl did, if I understand correctly - see item 6
in http://www.tcl.tk/cgi-bin/tct/tip/132.html .

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-11 Thread Noam Raphael

Noam Raphael added the comment:

‎That's right, but the standard also defines that 0.0/0 -> nan, and
1.0/0 -> inf, but instead we raise an exception. It's just that in
Python, every object is expected to be equal to itself. Otherwise, how
can I check if a number is nan?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-11 Thread Noam Raphael

Noam Raphael added the comment:

If I understand correctly, there are two main concerns: speed and
portability. I think that they are both not that terrible.

How about this:
* For IEEE-754 hardware, we implement decimal/binary conversions, and
define the exact behaviour of floats.
* For non-IEEE-754 hardware, we keep the current method of relying on
the system libraries.

About speed, perhaps it's not such a big problem, since decimal/binary
conversions are usually related to I/O, and this is relatively slow
anyway. I think that usually a program does a relatively few
decimal/binary conversions.
About portability, I think (from a small research I just made) that
S90 supports IEEE-754. This leaves VAX and cray users, which will have
to live with a non-perfect floating-point behaviour.

If I am correct, it will let 99.9% of the users get a deterministic
floating-point behaviour, where eval(repr(f)) == f and
repr(1.1)=='1.1', with a speed penalty they won't notice.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-11 Thread Noam Raphael

Noam Raphael added the comment:

If I were in that situation I would prefer to store the binary
representation. But if someone really needs to store decimal floats,
we can add a method "fast_repr" which always calculates 17 decimal
digits.

Decimal to binary conversion, in any case, shouldn't be slower than it
is now, since on Gnu it is done anyway, and I don't think that our
implementation should be much slower.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-12 Thread Noam Raphael

Noam Raphael added the comment:

Ok, so if I understand correctly, the ideal thing would be to
implement decimal to binary conversion by ourselves. This would make
str <-> float conversion do the same thing on all platforms, and would
make repr(1.1)=='1.1'. This would also allow us to define exactly how
floats operate, with regard to infinities and NaNs. All this is for
IEEE-754 platforms -- for the rare platforms which don't support it,
the current state remains.

However, I don't think I'm going, in the near future, to add a decimal
to binary implementation -- the Tcl code looks very nice, but it's
quite complicated and I don't want to fiddle with it right now.

If nobody is going to implement the correctly rounding decimal to
binary conversion, then I see three options:
1. Revert to previous situation
2. Keep the binary to shortest decimal routine and use it only when we
know that the system's decimal to binary routine is correctly rounding
(we can check - perhaps Microsoft has changed theirs?)
3. Keep the binary to shortest decimal routine and drop repr(f) == f
(I don't like that option).

If options 2 or 3 are chosen, we can check the 1e5 bug.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-13 Thread Noam Raphael

Noam Raphael added the comment:

2007/12/13, Guido van Rossum <[EMAIL PROTECTED]>:
>
> > Ok, so if I understand correctly, the ideal thing would be to
> > implement decimal to binary conversion by ourselves. This would make
> > str <-> float conversion do the same thing on all platforms, and would
> > make repr(1.1)=='1.1'. This would also allow us to define exactly how
> > floats operate, with regard to infinities and NaNs. All this is for
> > IEEE-754 platforms -- for the rare platforms which don't support it,
> > the current state remains.
>
> Does doubledigits.c not work for non-754 platforms?

No. It may be a kind of an oops, but currently it just won't compile
on platforms which it doesn't recognize, and it only recognizes 754
platforms.
>
> > 2. Keep the binary to shortest decimal routine and use it only when we
> > know that the system's decimal to binary routine is correctly rounding
> > (we can check - perhaps Microsoft has changed theirs?)
>
> Tim says you can't check (test) for this -- you have to prove it from
> source, or trust the vendor's documentation. I would have no idea
> where to find this documented.
>
The program for testing floating point compatibility is in
http://www.cant.ua.ac.be/ieeecc754.html

To run it, on my computer, I used:
./configure -target Conversions -platform IntelPentium_cpp
make
./IeeeCC754 -d -r n -n x Conversion/testsets/d2bconvd
less ieee.log

This tests only doubles, round to nearest, and ignores flags which
should be raised to signal inexact conversion. You can use any file in
Conversions/testsets/d2b* - I chose this one pretty randomly.

It turns out that even on my gcc 4.1.3 it finds a few floats not
correctly rounded. :(

Anyway, it can be used to test other platforms. If not by the
executable itself, we can pretty easily write a python program which
uses the test data.

I don't know what exactly the errors with gcc 4.1.3 mean - is there a
problem with the algorithm of glibc, or perhaps the testing program
didn't set some flag?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-17 Thread Noam Raphael

Noam Raphael added the comment:

Ok, I think I have a solution!

We don't really need always the shortest decimal representation. We just
want that for most floats which have a nice decimal representation, that
representation will be used. 

Why not do something like that:

def newrepr(f):
r = str(f)
if eval(r) == f:
return r
else:
return repr(f)

Or, in more words:

1. Calculate the decimal representation of f with 17 precision digits,
s1, using the system's routines.
2. Create a new string, s2, by rounding the resulting string to 12
precision digits.
3. Convert the resulting rounded string to a new double, g, using the
system's routines.
4. If f==g, return s2. Otherwise, return s1.

It will take some more time than the current repr(), because of the
additional decimal to binary conversion, but we already said that if
speed is extremely important one can use "'%f.17' % f". It will
obviously preserve the eval(repr(f)) == f property. And it will return a
short representation for almost any float that has a short representation.

This algorithm I will be glad to implement.

What do you think?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-18 Thread Noam Raphael

Noam Raphael added the comment:

I think that we can give up float(repr(x)) == x across different
platforms, since we don't guarantee something more basic: We don't
guarantee that the same program doing only floating point operations
will produce the same results across different 754 platforms, because
in the compilation process we rely on the system's decimal to binary
conversion. In other words, using the current repr(), one can pass a
value x from platform A platform B and be sure to get the same value.
But if he has a python function f, he can't be sure that f(x) on
platform A will result in the same value as f(x) on platform B. So the
cross-platform repr() doesn't really matter.

I like eval(repr(x)) == x because it means that repr(x) captures all
the information about x, not because it lets me pass x from one
platform to another. For communication, I use other methods.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-18 Thread Noam Raphael

Noam Raphael added the comment:

2007/12/18, Raymond Hettinger <[EMAIL PROTECTED]>:
> The 17 digit representation is useful in that it suggests where the
> problem lies.  In contrast, showing two numbers with reprs of different
> lengths will strongly suggest that the shorter one is exactly
> represented.  Currently, that is a useful suggestion, 10.25 shows as
> 10.25 while 10.21 shows as 10.211 (indicating that the
> latter is not exactly represented).  If you start showing 1.1 as 1.1,
> then you've lost both benefits.

Currently, repr(1.3) == '1.3', suggesting that it is exactly
represented, which isn't true. I think that unless you use an
algorithm that will truncate zeros only if the decimal representation
is exact, the suggested algorithm is less confusing than the current
one, in that it doesn't suggest that 1.3 is exactly stored and 1.1
isn't.

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2007-12-18 Thread Noam Raphael

Noam Raphael added the comment:

About the educational problem. If someone is puzzled by "1.1*3 !=
3.3", you could always use '%50f' % 1.1 instead of repr(1.1). I don't
think that trying to teach people that floating points don't always do
what they expect them to do is a good reason to print uninteresting
and visually distracting digits when you don't have to.

About the compatibility problem: I don't see why it should matter to
the NumPy people if the repr() of some floats is made shorter. Anyway,
we can ask them, using a PEP or just the mailing list.

About the benefit: If I have data which contains floats, I'm usually
interested about their (physical) value, not about their last bits.
That's why str(f) does what it does. I like repr(x) to be one-to-one,
as I explained in the previous message, but if it can be made more
readable, why not make it so?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1580>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue979658] Improve HTML documentation of a directory

2008-01-05 Thread Noam Raphael

Noam Raphael added the comment:

I just wanted to say that I'm not going to bother too much with this
right now - Personally I will just use epydoc when I want to create an
HTML documentation. Of course, you can still do whatever you like with
the patch.

Good luck,
Noam

--
nosy: +noam


Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue979658>

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7260] SyntaxError with a not-existing offset for unicode code

2009-11-03 Thread Noam Raphael

New submission from Noam Raphael :

Hello,

This is from the current svn:

> ./python
Python 3.2a0 (py3k:76104, Nov  4 2009, 08:49:44) 
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> try:
... eval("u'שלום'")
... except SyntaxError as e:
... e
... 
SyntaxError('invalid syntax', ('', 1, 11, "u'שלום'"))

As you can see, the offset (11) refers to a non-existing character, as
the code contains only 7 characters.

Thanks,
Noam

--
components: Interpreter Core
messages: 94879
nosy: noam
severity: normal
status: open
title: SyntaxError with a not-existing offset for unicode code
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue7260>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2009-03-01 Thread Noam Raphael

Noam Raphael  added the comment:

I'm sorry, but it seems to me that the conclusion of the discussion in
2008 is that the algorithm should simply use the system's
binary-to-decimal routine, and if the result is like 123.456, round it
to 15 digits after the 0, check if the result evaluates to the original
value, and if so, return the rounded result. This would satisfy most
people, and has no need for complex rounding algorithms. Am I mistaken?

If I implement this, will anybody be interested?

Noam

___
Python tracker 
<http://bugs.python.org/issue1580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1580] Use shorter float repr when possible

2009-03-02 Thread Noam Raphael

Noam Raphael  added the comment:

Do you mean msg58966?

I'm sorry, I still don't understand what's the problem with returning
f_15(x) if eval(f_15(x)) == x and otherwise returning f_17(x). You said
(msg69232) that you don't care if float(repr(x)) == x isn't
cross-platform. Obviously, the simple method will preserve eval(repr(x))
== x, no matter what rounding bugs are present on the platform.

___
Python tracker 
<http://bugs.python.org/issue1580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com