date:20101128

Issue 1050 claims that the 3.1.2 installer has the virus Palevo.DZ.
Can somebody with a virus scanner please confirm or contest that
claim?

Thanks,
Martin

http://bugs.python.org/issue10500
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib


On 28/11/2010 03:20, Terry Reedy wrote:

On 11/27/2010 6:26 PM, Raymond Hettinger wrote:


Can I suggest that an enum-maker be offered as a third-party module


Possibly with competing versions for trial and testing ;-)


rather than prematurely adding it into the standard library.


I had same thought.



There are already *several* enum packages for Python available. The 
implementation by Ben Finney, associated with the previous PEP, is on 
PyPI and the most recent release has over 4000 downloads making it 
reasonably popular:


http://pypi.python.org/pypi/enum/

Other contenders include flufl.enum and lazr.enum. The Twisted guys 
would like a named constant type, and have a ticket for it, and PyQt has 
its own implementation (subclassing int) providing this functionality. 
In terms of assessing *general* usefulness in the wider community that 
step has already been done.


This discussion came out of yet-another-set-of-integer-constants being 
added to the Python standard library (since changed to strings). We have 
integer constants, with the associated inscrutability when used from the 
interactive interpreter or debugging, in *many* standard library 
modules. The particular features and use cases being discussed have use 
*within* the standard library in mind.


Releasing yet-another-enum-library-that-the-standard-library-can't-use 
would be a particularly pointless outcome of this discussion. The 
decision is whether or not to use named constants in the standard 
library, otherwise we can just point people at one of the several 
existing packages.


All the best,

Michael Foord

--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Question about GDB bindings and 32/64 bits

2010-11-28 Thread Matthias Klose


On 26.11.2010 05:11, Jesus Cea wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have installed GDB 7.2 32 bits and 32 bits buildslaves are green.
Nevertheless 64 bits buildslaves are failing test_gdb.

Is there any expectation that a 32 bits GDB be able to debug a 64 bits
python?. If not, gdb test should compare "platform.architecture()" (for
python and gdb in the system) and run only when they are the same.


that would be too restrictive, as an 64bit gdb is able to handle 32bit binaries 
too.


If
this should work, I would open a bug and maybe spend some time with it.

But before thinking about investing time, I would like to know if this
mix is actually expected or not to work.

If not, I would consider to install a 64 bits GDB too and do some tricks
(like using an "/usr/local/bin/gdb" script wrapper to choose 32/64
"real" gdb version) to actually execute "test_gdb" in both buildslaves
(they are running in the same physical machine).


yes, and then you should be able to use this gdb for both 32 and 64bit builds. 
No need for a wrapper (Such a gdb is available in the gdb64 package on 
Debian/Ubuntu).


  Matthias
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib


On 28/11/2010 02:38, Nick Coghlan wrote:

On Sun, Nov 28, 2010 at 9:26 AM, Raymond Hettinger
  wrote:

On Nov 27, 2010, at 12:56 PM, Glenn Linderman wrote:


On 11/27/2010 2:51 AM, Nick Coghlan wrote:

Not quite. I'm suggesting a factory function that works for any value,
and derives the parent class from the type of the supplied value.

Nick, thanks for the much better implementation than I achieved; you seem to 
have the same goals as my implementation.  I learned a bit making mine, and 
more understanding yours to some degree.  What I still don't understand about 
your implementation, is that when adding one additional line to your file, it 
fails:

w = named_value("ABC", z )

Now I can understand why it might not be a good thing to make a named value of 
a named value (confusing, at least), but I was surprised, and still do not 
understand, that it failed reporting the __new__() takes exactly 3 arguments (2 
given).

Can I suggest that an enum-maker be offered as a third-party module rather than 
prematurely adding it into the standard library.

Indeed. Glenn's failing example suggests to me that using a new
metaclass is probably going to be a cleaner option than trying to
dance around type's default behaviour within an ordinary class
definition (if nothing else, a separate metaclass makes it much easier
to detect when you're dealing with an instance of a named type).



Yep, for representing a group of names a single class with a metaclass 
seems like a reasonable approach. See my note below about agreeing 
minimal feature-set and minimal-api before we discuss implementation 
though.



Regardless, I still see value in approaching this whole discussion as
a two-level design problem, with "named values" as the more
fundamental concept, and then higher level grouping APIs to get
enum-style behaviour.


It seems like using the term "enum" provokes a strong negative reaction 
in some of the core-devs who are basically in favour named constants and 
not actively against grouping. I'm happy with NamedConstant and 
GroupedNames (or similar) and dropping the use of the term enum.


There are also valid concerns about over-engineering (and not so valid 
concerns...). Simplicity in creating them and no additional burden in 
using them are fundamental, but in the APIs / implementations suggested 
so far I think we are keeping that in mind.



Eventually attaining "One Obvious Way" for the
former seems achievable to me, while the diversity of use cases for
grouping APIs suggests to me that "one-size-fits-all" isn't going to
work unless that "one size" is a Frankenstein API with more options
than anyone could reasonably hope to keep in their head at once.


Well... yes - treating it as a two level design problem is fine.

I don't think there are *many* competing features, in fact as far as 
feature requests on python-dev go I think this is a relatively 
straightforward one with a lot of *agreement* on the basic functionality.


We have had various discussions about what the API should look like, or 
what the implementation should look like, but I don't think there is a 
lot of disagreement about basic features. There are some 'optional 
features'. Many of these can be added later without backwards 
compatibility issues, so those can profitably be omitted from an initial 
implementation.


Features as I see them:

Named constant
--

* Nice repr
* Subclass of the type it represents
* Trivially easy to convert either to a string (name) and the value it 
represents
* If an integer type, can be OR'd with other named constants and retains 
a useful repr



Grouped constants

* Easy to create a group of named constants, accessible as attributes on 
group object

* Capability to go from name or value to corresponding constants


Optional Features
---

* Ability to dynamically add new named values to a group. (Suggested by 
Guido)

* Ability to test if a name or value is in a group
* Ability to list all names in a group
* ANDing as well as ORing
* Constants are unique
* OR'ing with an integer will look up the name (or calculate it if the 
int itself represents flags that have already been OR'd) and return a 
named value (with useful repr) instead of just an integer
* Named constants be named values that can wrap *any* type and not just 
immutable values. (Note that wrapping mutable types makes providing 
"from_value" functionality harder *unless* we guarantee that named 
values are unique. If they aren't unique named values for a mutable type 
can have different values and there is no single definition of what the 
named value actually is.)
Requiring that values only have one name - or alternatively that values 
on a group could have multiple names (obviously incompatible features).

* Requiring all names in a group to be of the same type
* Allow names to be set automatically in a namespace, for example in a 
class namespace or on a module

* Allow subclassing and adding of new

Re: [Python-Dev] constant/enum type in stdlib


On 28/11/2010 16:28, Michael Foord wrote:

[snip...]
I don't think there are *many* competing features, in fact as far as 
feature requests on python-dev go I think this is a relatively 
straightforward one with a lot of *agreement* on the basic functionality.


We have had various discussions about what the API should look like, 
or what the implementation should look like, but I don't think there 
is a lot of disagreement about basic features. There are some 
'optional features'. Many of these can be added later without 
backwards compatibility issues, so those can profitably be omitted 
from an initial implementation.


Features as I see them:

Named constant
--

* Nice repr
* Subclass of the type it represents
* Trivially easy to convert either to a string (name) and the value it 
represents
* If an integer type, can be OR'd with other named constants and 
retains a useful repr


Note that having an OR repr is meaningless *unless* the constants are 
intended to be flags, OR'ing should be specified.


name = NamedValue('name', value, flags=True)

Where flags defaults to False. Typically you will use this through the 
grouping API anyway - where it can either be a keyword argument 
(slightly annoying because the suggestion is to create the named values 
through keyword arguments) or we can have two group-factory functions:


Group = make_constants('Group', name1=value1, name2=value2)
Flags = make_flags('Flags', name1=value1, name2=value2)

It is sensible if flag values are only powers of 2; we could enforce 
that or not... (Another one for the optional feature list.)


I forgot auto-enumeration (specifying names only and having values 
autogenerated) from the optional feature set by the way. I think Antoine 
strongly disapproves of this feature because it reminds him of C enums.


Mark Dickinson thinks that the flags feature could be an optional 
feature too. If we have ORing it makes sense to have ANDing, so I guess 
they belong together. I think there is value in it though.


I realise that the optional feature list is now not small, and 
implementing all of it would create the "franken-api" Nick is worried 
about. The minimal feature list is nicely small though and provides 
useful functionality.


All the best,

Michael



Grouped constants

* Easy to create a group of named constants, accessible as attributes 
on group object

* Capability to go from name or value to corresponding constants


Optional Features
---

* Ability to dynamically add new named values to a group. (Suggested 
by Guido)

* Ability to test if a name or value is in a group
* Ability to list all names in a group
* ANDing as well as ORing
* Constants are unique
* OR'ing with an integer will look up the name (or calculate it if the 
int itself represents flags that have already been OR'd) and return a 
named value (with useful repr) instead of just an integer
* Named constants be named values that can wrap *any* type and not 
just immutable values. (Note that wrapping mutable types makes 
providing "from_value" functionality harder *unless* we guarantee that 
named values are unique. If they aren't unique named values for a 
mutable type can have different values and there is no single 
definition of what the named value actually is.)
Requiring that values only have one name - or alternatively that 
values on a group could have multiple names (obviously incompatible 
features).

* Requiring all names in a group to be of the same type
* Allow names to be set automatically in a namespace, for example in a 
class namespace or on a module

* Allow subclassing and adding of new values only present in subclass


I'd rather we agree a suitable (minimal) API and feature set and go to 
implementation from that.


For wrapping mutable types I'm tempted to say YAGNI. For the standard 
library wrapping integers meets almost all our use-cases except for 
one float. (At work we have a decimal constant as it happens.) Perhaps 
we could require immutable types for groups but allow arbitrary values 
for individual named values?


For the named values api:

name = NamedValue('name', value)

For the grouping (tentatively accepted as reasonable by Antoine):

Group = make_constants('Group', name1=value1, name2=value2)
name1, name2 = Group.name1, Group.name1
flag = name1 | name2

value = int(Group.name1)
name = Group('name1')
# alternatively: value = Group.from_name('name1')
name = Group.from_value(value1)
# Group(value1) could work only if values aren't strings
# perhaps: name = Group(value=value1)

Group.new_name = value3 # create new value on the group
names = Group.all_names()
# further bikeshedding on spelling of all_names required
# correspondingly 'all_values' I guess, returning the constants 
themselves


Some of the optional features couldn't later be added without 
backwards compatibility concerns (I think the type checking features 
and requiring unique values for example). We should at le

Re: [Python-Dev] constant/enum type in stdlib

On 28/11/2010 17:05, Michael Foord wrote:

[snip...]
It is sensible if flag values are only powers of 2; we could enforce 
that or not... (Another one for the optional feature list.)

Another 'optional' feature I omitted was Phillip J. Eby's suggestion / 
requirement that named values be pickleable. Email is clunky for 
handling this, is there enough support (there is still some objection 
that is sure) to revive the PEP or create a new one?

I also didn't include Nick's suggested API, which is slightly different 
from the one I suggested:

silly = Namegroup.from_names("Silly", "FOO", "BAR", "BAZ")
>>> silly.FOO
Silly.FOO=0
>>> int(silly.FOO)
0
>>> silly(0)
Silly.FOO=0

x = named_value("FOO", 1)
y = named_value("BAR", "Hello World!")
z = named_value("BAZ", dict(a=1, b=2, c=3))

set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw())

Where a named value created from an integer is an int subclass, from a 
dict a dict subclass and so on.

Michael

I forgot auto-enumeration (specifying names only and having values 
autogenerated) from the optional feature set by the way. I think 
Antoine strongly disapproves of this feature because it reminds him of 
C enums.

Mark Dickinson thinks that the flags feature could be an optional 
feature too. If we have ORing it makes sense to have ANDing, so I 
guess they belong together. I think there is value in it though.

I realise that the optional feature list is now not small, and 
implementing all of it would create the "franken-api" Nick is worried 
about. The minimal feature list is nicely small though and provides 
useful functionality.

All the best,

Michael

Grouped constants

* Easy to create a group of named constants, accessible as attributes 
on group object

* Capability to go from name or value to corresponding constants

Optional Features
---

* Ability to dynamically add new named values to a group. (Suggested 
by Guido)

* Ability to test if a name or value is in a group
* Ability to list all names in a group
* ANDing as well as ORing
* Constants are unique
* OR'ing with an integer will look up the name (or calculate it if 
the int itself represents flags that have already been OR'd) and 
return a named value (with useful repr) instead of just an integer
* Named constants be named values that can wrap *any* type and not 
just immutable values. (Note that wrapping mutable types makes 
providing "from_value" functionality harder *unless* we guarantee 
that named values are unique. If they aren't unique named values for 
a mutable type can have different values and there is no single 
definition of what the named value actually is.)
Requiring that values only have one name - or alternatively that 
values on a group could have multiple names (obviously incompatible 
features).

* Requiring all names in a group to be of the same type
* Allow names to be set automatically in a namespace, for example in 
a class namespace or on a module

* Allow subclassing and adding of new values only present in subclass

I'd rather we agree a suitable (minimal) API and feature set and go 
to implementation from that.

For wrapping mutable types I'm tempted to say YAGNI. For the standard 
library wrapping integers meets almost all our use-cases except for 
one float. (At work we have a decimal constant as it happens.) 
Perhaps we could require immutable types for groups but allow 
arbitrary values for individual named values?

For the named values api:

name = NamedValue('name', value)

For the grouping (tentatively accepted as reasonable by Antoine):

Group = make_constants('Group', name1=value1, name2=value2)
name1, name2 = Group.name1, Group.name1
flag = name1 | name2

value = int(Group.name1)
name = Group('name1')
# alternatively: value = Group.from_name('name1')
name = Group.from_value(value1)
# Group(value1) could work only if values aren't strings
# perhaps: name = Group(value=value1)

Group.new_name = value3 # create new value on the group
names = Group.all_names()
# further bikeshedding on spelling of all_names required
# correspondingly 'all_values' I guess, returning the constants 
themselves

Some of the optional features couldn't later be added without 
backwards compatibility concerns (I think the type checking features 
and requiring unique values for example). We should at least consider 
these if we are to make adding them later difficult. I would be fine 
with not having these features.

All the best,

Michael

Cheers,
Nick.

--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in

Re: [Python-Dev] constant/enum type in stdlib

2010-11-28 Thread Steven D'Aprano


Michael Foord wrote:

Another 'optional' feature I omitted was Phillip J. Eby's suggestion / 
requirement that named values be pickleable. Email is clunky for 
handling this, is there enough support (there is still some objection 
that is sure) to revive the PEP or create a new one?


I think it definitely needs a PEP. I don't care whether you revive the 
old PEP or write a new one.




--
Steven
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib


On 28/11/2010 18:05, Steven D'Aprano wrote:

Michael Foord wrote:

Another 'optional' feature I omitted was Phillip J. Eby's suggestion 
/ requirement that named values be pickleable. Email is clunky for 
handling this, is there enough support (there is still some objection 
that is sure) to revive the PEP or create a new one?


I think it definitely needs a PEP. I don't care whether you revive the 
old PEP or write a new one.


Well, "if it were to be accepted it would need a PEP" and "the next step 
should be a PEP" are slightly different statements. :-)


As I agree with the former *anyway* at the worst starting a PEP will 
waste time, so I guess I'll get that underway when I get a chance...


Thanks

Michael

--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Python and the Unicode Character Database

Two recently reported issues brought into light the fact that Python
language definition is closely tied to character properties maintained
by the Unicode Consortium. [1,2]  For example, when Python switches to
Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two
additional characters that Python can use in identifiers. [3]

With Python 3.1:

>>> exec('\u0CF1 = 1')
Traceback (most recent call last):
 File "", line 1, in 
 File "", line 1
   ೱ = 1
 ^
SyntaxError: invalid character in identifier

but with Python 3.2a4:

>>> exec('\u0CF1 = 1')
>>> eval('\u0CF1')
1


Of course, the likelihood is low that this change will affect any
user, but the change in str.isspace() reported in [1] is likely to
cause some trouble:

Python 2.6.5:
>>> u'A\u200bB'.split()
[u'A', u'B']

Python 2.7:
>>> u'A\u200bB'.split()
[u'A\u200bB']

While we have little choice but to follow UCD in defining
str.isidentifier(), I think Python can promise users more stability in
what it treats as space or as a digit in its builtins.   For example,
I don't think that supporting

>>> float('١٢٣٤.٥٦')
1234.56

is more important than to assure users that once their program
accepted some text as a number, they can assume that the text is
ASCII.

[1] http://bugs.python.org/issue10567
[2] http://bugs.python.org/issue10557
[3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, 28 Nov 2010 15:24:37 -0500
Alexander Belopolsky  wrote:
> While we have little choice but to follow UCD in defining
> str.isidentifier(), I think Python can promise users more stability in
> what it treats as space or as a digit in its builtins.

Well, if "unicode support" means "support the latest version of the
Unicode standard", I'm not sure we have a choice.
We can make exceptions, but that would only confuse users even more,
wouldn't it?

> For example,
> I don't think that supporting
> 
> >>> float('١٢٣٤.٥٦')
> 1234.56
> 
> is more important than to assure users that once their program
> accepted some text as a number, they can assume that the text is
> ASCII.

Why would they assume the text is ASCII?

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou  wrote:
..
>> For example,
>> I don't think that supporting
>>
>> >>> float('١٢٣٤.٥٦')
>> 1234.56
>>
>> is more important than to assure users that once their program
>> accepted some text as a number, they can assume that the text is
>> ASCII.
>
> Why would they assume the text is ASCII?

def deposit(self, amountstr):
  self.balance += float(amountstr)
  audit_log("Deposited: " + amountstr)

Auditor:

$ cat numbered-account.log
Deposited: ?.??
...
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, 28 Nov 2010 15:58:33 -0500
Alexander Belopolsky  wrote:

> On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou  wrote:
> ..
> >> For example,
> >> I don't think that supporting
> >>
> >> >>> float('١٢٣٤.٥٦')
> >> 1234.56
> >>
> >> is more important than to assure users that once their program
> >> accepted some text as a number, they can assume that the text is
> >> ASCII.
> >
> > Why would they assume the text is ASCII?
> 
> def deposit(self, amountstr):
>   self.balance += float(amountstr)
>   audit_log("Deposited: " + amountstr)
> 
> Auditor:
> 
> $ cat numbered-account.log
> Deposited: ?.??


I'm not sure that's how banking applications are written :)

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Joao S. O. Bueno

On Sun, Nov 28, 2010 at 7:04 PM, Antoine Pitrou  wrote:
> On Sun, 28 Nov 2010 15:58:33 -0500
> Alexander Belopolsky  wrote:
>
>> On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou  wrote:
>> ..
>> >> For example,
>> >> I don't think that supporting
>> >>
>> >> >>> float('١٢٣٤.٥٦')
>> >> 1234.56
>> >>
>> >> is more important than to assure users that once their program
>> >> accepted some text as a number, they can assume that the text is
>> >> ASCII.
>> >
>> > Why would they assume the text is ASCII?
>>
>> def deposit(self, amountstr):
>>       self.balance += float(amountstr)
>>       audit_log("Deposited: " + amountstr)
>>
>> Auditor:
>>
>> $ cat numbered-account.log
>> Deposited: ?.??
>
>
> I'm not sure that's how banking applications are written :)
>
+1 for this being bogus  - I see no correlation whatsoever in numbers
inside unicode having to be "ASCII" if we have surpassed all technical
barriers for needing to behave like that.  ASCII is an
oversimplification of human communication needed for computing devices
not complex enough to represent it fully.

Let novice C programmers in English speaking countries deal with the
fact that 1 character is not 1 byte anymore. We are past this point.

  js
  -><-



> Antoine.
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/jsbueno%40python.org.br
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 4:12 PM, Joao S. O. Bueno  wrote:
..
> Let novice C programmers in English speaking countries deal with the
> fact that 1 character is not 1 byte anymore. We are past this point.

If you are, please contribute your expertise here:

http://bugs.python.org/issue2382
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib

2010-11-28 Thread Greg Ewing


Rob Cliffe wrote:

But couldn't they be presented to the Python programmer as a single 
type, with the implementation details hidden "under the hood"?


Not in CPython, because tuple items are kept in the same block
of memory as the object header. Because CPython can't move
objects, this means that the size of the tuple must be known
when the object is created.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

> float('١٢٣٤.٥٦')
>> 1234.56

I think it's a bug that this works. The definition of the float builtin says

Convert a string or a number to floating point. If the argument is a
string, it must contain a possibly signed decimal or floating point
number, possibly embedded in whitespace. The argument may also be
'[+|-]nan' or '[+|-]inf'.

Now, one may wonder what precisely a "possibly signed floating point
number" is, but most likely, this refers to

floatnumber   ::=  pointfloat | exponentfloat
pointfloat::=  [intpart] fraction | intpart "."
exponentfloat ::=  (intpart | pointfloat) exponent
intpart   ::=  digit+
fraction  ::=  "." digit+
exponent  ::=  ("e" | "E") ["+" | "-"] digit+
digit  ::=  "0"..."9"

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 5:17 PM, "Martin v. Löwis"  wrote:
>> float('١٢٣٤.٥٦')
>>> 1234.56
>
> I think it's a bug that this works. The definition of the float builtin says
>
> Convert a string or a number to floating point. If the argument is a
> string, it must contain a possibly signed decimal or floating point
> number, possibly embedded in whitespace. The argument may also be
> '[+|-]nan' or '[+|-]inf'.
>

This definition fails long before we get beyond 127-th code point:

>>> float('infinity')
inf
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread M.-A. Lemburg

"Martin v. Löwis" wrote:
>> float('١٢٣٤.٥٦')
>>> 1234.56
> 
> I think it's a bug that this works. The definition of the float builtin says
> 
> Convert a string or a number to floating point. If the argument is a
> string, it must contain a possibly signed decimal or floating point
> number, possibly embedded in whitespace. The argument may also be
> '[+|-]nan' or '[+|-]inf'.
> 
> Now, one may wonder what precisely a "possibly signed floating point
> number" is, but most likely, this refers to
> 
> floatnumber   ::=  pointfloat | exponentfloat
> pointfloat::=  [intpart] fraction | intpart "."
> exponentfloat ::=  (intpart | pointfloat) exponent
> intpart   ::=  digit+
> fraction  ::=  "." digit+
> exponent  ::=  ("e" | "E") ["+" | "-"] digit+
> digit  ::=  "0"..."9"

I don't see why the language spec should limit the wealth of number
formats supported by float().

It is not uncommon for Asians and other non-Latin script users to
use their own native script symbols for numbers. Just because these
digits may look strange to someone doesn't mean that they are
meaningless or should be discarded.

Please also remember that Python3 now allows Unicode names for
identifiers for much the same reasons.

Note that the support in float() (and the other numeric constructors)
to work with Unicode code points was explicitly added when Unicode
support was added to Python and has been available since Python 1.6.

It is not a bug by any definition of "bug", even though the feature
may bug someone occasionally to go read up a bit on what else
the world has to offer other than Arabic numerals :-)

http://en.wikipedia.org/wiki/Numeral_system

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 28 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread M.-A. Lemburg

Alexander Belopolsky wrote:
> Two recently reported issues brought into light the fact that Python
> language definition is closely tied to character properties maintained
> by the Unicode Consortium. [1,2]  For example, when Python switches to
> Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two
> additional characters that Python can use in identifiers. [3]
> 
> With Python 3.1:
> 
 exec('\u0CF1 = 1')
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "", line 1
>ೱ = 1
>  ^
> SyntaxError: invalid character in identifier
> 
> but with Python 3.2a4:
> 
 exec('\u0CF1 = 1')
 eval('\u0CF1')
> 1

Such changes are not new, but I agree that they should probably
be highlighted in the "What's new in Python x.x".

> Of course, the likelihood is low that this change will affect any
> user, but the change in str.isspace() reported in [1] is likely to
> cause some trouble:
> 
> Python 2.6.5:
 u'A\u200bB'.split()
> [u'A', u'B']
> 
> Python 2.7:
 u'A\u200bB'.split()
> [u'A\u200bB']

That's a classical bug fix.

> While we have little choice but to follow UCD in defining
> str.isidentifier(), I think Python can promise users more stability in
> what it treats as space or as a digit in its builtins. 

Why should we divert from the work done by the Unicode Consortium ?
After all, most of their changes are in fact bug fixes as well.

> For example,
> I don't think that supporting
> 
 float('١٢٣٤.٥٦')
> 1234.56
> 
> is more important than to assure users that once their program
> accepted some text as a number, they can assume that the text is
> ASCII.

Sorry, but I don't agree.

If ASCII numerals are an important aspect of an application, the
application should make sure that only those numerals are used
(e.g. by using a regular expression for checking).

In a Unicode world, not accepting non-Arabic numerals would be
a limitation, not a feature. Besides Python has had this support
since Python 1.6.

> [1] http://bugs.python.org/issue10567
> [2] http://bugs.python.org/issue10557
> [3] http://www.unicode.org/versions/Unicode6.0.0/#Database_Changes

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 28 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 5:42 PM, M.-A. Lemburg  wrote:
..
> I don't see why the language spec should limit the wealth of number
> formats supported by float().
>

The Language Spec (whatever it is) should not, but hopefully the
Library Reference should.  If you follow
http://docs.python.org/dev/py3k/library/functions.html#float link and
the references therein, you'll end up with

digit  ::=  "0"..."9"

http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

Am 28.11.2010 23:31, schrieb Alexander Belopolsky:
> On Sun, Nov 28, 2010 at 5:17 PM, "Martin v. Löwis"  wrote:
>>> float('١٢٣٤.٥٦')
 1234.56
>>
>> I think it's a bug that this works. The definition of the float builtin says
>>
>> Convert a string or a number to floating point. If the argument is a
>> string, it must contain a possibly signed decimal or floating point
>> number, possibly embedded in whitespace. The argument may also be
>> '[+|-]nan' or '[+|-]inf'.
>>
> 
> This definition fails long before we get beyond 127-th code point:
> 
 float('infinity')
> inf

What do infer from that? That the definition is wrong, or the code is wrong?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Terry Reedy


On 11/28/2010 3:58 PM, Alexander Belopolsky wrote:

On Sun, Nov 28, 2010 at 3:43 PM, Antoine Pitrou  wrote:
..

For example,
I don't think that supporting


float('١٢٣٤.٥٦')

1234.56


Even if this is somehow an accident or something that someone snuck in, 
I think it a good idea that *users* be able to input amounts with their 
native digits. That is different from requiring *programmers* to write 
literals with euro-ascii-digits



is more important than to assure users that once their program
accepted some text as a number, they can assume that the text is
ASCII.


Why would they assume the text is ASCII?


def deposit(self, amountstr):
   self.balance += float(amountstr)
   audit_log("Deposited: " + amountstr)


If the programmer want to assure ascii, he can produce a string, 
possible formatted, from the amount


depform = "Deposited: ${:14.2f}".format
def deposit(self, amountstr):
amount = float(amountstr)
self.balance += amount
# audit_log("Deposited: " + str(amount) # simple version
audit_log(depform(amount))

Given that amountstr could be something like '182.33', I 
think programmer should plan to format it.


--
Terry Jan Reedy


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. Löwis"  wrote:
..
>> This definition fails long before we get beyond 127-th code point:
>>
> float('infinity')
>> inf
>
> What do infer from that? That the definition is wrong, or the code is wrong?

The development version of the reference manual is more detailed, but
as far as I can tell, it still defines digit as 0-9.

http://docs.python.org/dev/py3k/library/functions.html#float
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

>> Now, one may wonder what precisely a "possibly signed floating point
>> number" is, but most likely, this refers to
>>
>> floatnumber   ::=  pointfloat | exponentfloat
>> pointfloat::=  [intpart] fraction | intpart "."
>> exponentfloat ::=  (intpart | pointfloat) exponent
>> intpart   ::=  digit+
>> fraction  ::=  "." digit+
>> exponent  ::=  ("e" | "E") ["+" | "-"] digit+
>> digit  ::=  "0"..."9"
> 
> I don't see why the language spec should limit the wealth of number
> formats supported by float().

If it doesn't, there should be some other specification of what
is correct and what is not. It must not be unspecified.

> It is not uncommon for Asians and other non-Latin script users to
> use their own native script symbols for numbers. Just because these
> digits may look strange to someone doesn't mean that they are
> meaningless or should be discarded.

Then these users should speak up and indicate their need, or somebody
should speak up and confirm that there are users who actually want
'١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing
system in which '١٢٣٤.٥٦e4' means 12345600.0.

> Please also remember that Python3 now allows Unicode names for
> identifiers for much the same reasons.

No no no. Addition of Unicode identifiers has a well-designed,
deliberate specification, with a PEP and all. The support for
non-ASCII digits in float appears to be ad-hoc, and not founded
on actual needs of actual users.

> Note that the support in float() (and the other numeric constructors)
> to work with Unicode code points was explicitly added when Unicode
> support was added to Python and has been available since Python 1.6.

That doesn't necessarily make it useful. Alexander's complaint is that
it makes Python unstable (i.e. changing as the UCD changes).

> It is not a bug by any definition of "bug"

Most certainly it is: the documentation is either underspecified,
or deviates from the implementation (when taking the most plausible
interpretation). This is the very definition of "bug".

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Terry Reedy


On 11/28/2010 5:51 PM, Alexander Belopolsky wrote:


The Language Spec (whatever it is) should not, but hopefully the
Library Reference should.  If you follow
http://docs.python.org/dev/py3k/library/functions.html#float link and
the references therein, you'll end up with

digit  ::=  "0"..."9"

http://docs.python.org/dev/py3k/reference/lexical_analysis.html#grammar-token-digit


So fix the doc for builtin float() and perhaps int().

--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

+1 on all point below.

On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. Löwis"  wrote:
>>> Now, one may wonder what precisely a "possibly signed floating point
>>> number" is, but most likely, this refers to
>>>
>>> floatnumber   ::=  pointfloat | exponentfloat
>>> pointfloat    ::=  [intpart] fraction | intpart "."
>>> exponentfloat ::=  (intpart | pointfloat) exponent
>>> intpart       ::=  digit+
>>> fraction      ::=  "." digit+
>>> exponent      ::=  ("e" | "E") ["+" | "-"] digit+
>>> digit          ::=  "0"..."9"
>>
>> I don't see why the language spec should limit the wealth of number
>> formats supported by float().
>
> If it doesn't, there should be some other specification of what
> is correct and what is not. It must not be unspecified.
>
>> It is not uncommon for Asians and other non-Latin script users to
>> use their own native script symbols for numbers. Just because these
>> digits may look strange to someone doesn't mean that they are
>> meaningless or should be discarded.
>
> Then these users should speak up and indicate their need, or somebody
> should speak up and confirm that there are users who actually want
> '١٢٣٤.٥٦' to denote 1234.56. To my knowledge, there is no writing
> system in which '١٢٣٤.٥٦e4' means 12345600.0.
>
>> Please also remember that Python3 now allows Unicode names for
>> identifiers for much the same reasons.
>
> No no no. Addition of Unicode identifiers has a well-designed,
> deliberate specification, with a PEP and all. The support for
> non-ASCII digits in float appears to be ad-hoc, and not founded
> on actual needs of actual users.
>
>> Note that the support in float() (and the other numeric constructors)
>> to work with Unicode code points was explicitly added when Unicode
>> support was added to Python and has been available since Python 1.6.
>
> That doesn't necessarily make it useful. Alexander's complaint is that
> it makes Python unstable (i.e. changing as the UCD changes).
>
>> It is not a bug by any definition of "bug"
>
> Most certainly it is: the documentation is either underspecified,
> or deviates from the implementation (when taking the most plausible
> interpretation). This is the very definition of "bug".
>
> Regards,
> Martin
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

Am 29.11.2010 00:01, schrieb Alexander Belopolsky:
> On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. Löwis"  wrote:
> ..
>>> This definition fails long before we get beyond 127-th code point:
>>>
>> float('infinity')
>>> inf
>>
>> What do infer from that? That the definition is wrong, or the code is wrong?
> 
> The development version of the reference manual is more detailed, but
> as far as I can tell, it still defines digit as 0-9.
> 
> http://docs.python.org/dev/py3k/library/functions.html#float
> 

I wasn't asking about 0..9, but about "infinity". According to the
spec, it shouldn't accept that (and neither should it accept
'infinitY'). However, whether that's a spec bug or an implementation
bug - it seems like a minor issue to me (i.e. easily fixed).

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. Löwis"  wrote:
..
>> Note that the support in float() (and the other numeric constructors)
>> to work with Unicode code points was explicitly added when Unicode
>> support was added to Python and has been available since Python 1.6.
>
> That doesn't necessarily make it useful. Alexander's complaint is that
> it makes Python unstable (i.e. changing as the UCD changes).
>

What makes it worse, is that while superficially, Unicode versions
follow the same X.Y.Z format as Python versions, the stability
promises are completely different.  For example, it appears that the
general category for the ZERO WIDTH SPACE was changed in Unicode
4.0.1.  I don't think a change affecting str.split(), int(), float()
and probably numerous other library functions would be acceptable in a
Python micro release.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 6:08 PM, "Martin v. Löwis"  wrote:
> Am 29.11.2010 00:01, schrieb Alexander Belopolsky:
>> On Sun, Nov 28, 2010 at 5:56 PM, "Martin v. Löwis"  
>> wrote:
>> ..
 This definition fails long before we get beyond 127-th code point:

>>> float('infinity')
 inf
>>>
>>> What do infer from that? That the definition is wrong, or the code is wrong?
>>
>> The development version of the reference manual is more detailed, but
>> as far as I can tell, it still defines digit as 0-9.
>>
>> http://docs.python.org/dev/py3k/library/functions.html#float
>>
>
> I wasn't asking about 0..9, but about "infinity". According to the
> spec, it shouldn't accept that (and neither should it accept
> 'infinitY').

According to the link that I mentioned,

infinity   ::=  "Infinity" | "inf"

and "Case is not significant, so, for example, “inf”, “Inf”,
“INFINITY” and “iNfINity” are all acceptable spellings for positive
infinity."

I completely agree with your arguments and the reference manual has
been improved a lot in the recent years.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 6:19 PM, "Martin v. Löwis"  wrote:
..
> You can see the Unicode Consortium's stability policy at
>
> http://unicode.org/policies/stability_policy.html
>

>From the link above:
"""
As more experience is gathered in implementing the characters,
adjustments in the properties may become necessary. Examples of such
properties include, but are not limited to, the following:

General_Category
...
"""
> In a sense, this is stronger than Python's backwards compatibility
> promises (which allow for certain incompatible changes to occur
> over time, whereas Unicode makes promises about all future versions).

I would say it is *different* and should be taken into account when
tying language features to Unicode specifications. This was done in
PEP 3131.  Note that one of the stated objections was "Unicode is
young; its problems are not yet well understood and solved;"  (It is
still true.)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

>>> float('١٢٣٤.٥٦')
 1234.56
> 
> Even if this is somehow an accident or something that someone snuck in,
> I think it a good idea that *users* be able to input amounts with their
> native digits. That is different from requiring *programmers* to write
> literals with euro-ascii-digits

So one question is what kind of data float() is aimed at. I claim that
it is about "programmer" data, not "user" data. If it supported "user"
data, it probably would have to support "1,000" to denote 1e3 in the
U.S., and denote 1e0 in Germany. Our users are generally confused
on whether they should use th full stop or the comma as the decimal
separator.

As not even the locale-dependent issues are considered in float(),
it is clear to me that entering local numbers cannot possibly be
the objective of the function.

Instead, following a wide-spread Python convention, it is meant to be
the reverse of repr().

Can you name a single person who actually wants to write '١٢٣٤.٥٦'
as a number? I'm fairly skeptical that users of arabic-indic digits.
Instead,

http://en.wikipedia.org/wiki/Decimal_separator

suggests that they would rather U+066B, i.e. '١٢٣٤٫٥٦', which isn't
supported by Python.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

> What makes it worse, is that while superficially, Unicode versions
> follow the same X.Y.Z format as Python versions, the stability
> promises are completely different.  For example, it appears that the
> general category for the ZERO WIDTH SPACE was changed in Unicode
> 4.0.1.  I don't think a change affecting str.split(), int(), float()
> and probably numerous other library functions would be acceptable in a
> Python micro release.

Well, we managed to completely break Unicode normalization between
2.6.5 and 2.6.6, due to a bug.

You can see the Unicode Consortium's stability policy at

http://unicode.org/policies/stability_policy.html

In a sense, this is stronger than Python's backwards compatibility
promises (which allow for certain incompatible changes to occur
over time, whereas Unicode makes promises about all future versions).

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Benjamin Peterson

2010/11/28 M.-A. Lemburg :
>
>
> "Martin v. Löwis" wrote:
>>> float('١٢٣٤.٥٦')
 1234.56
>>
>> I think it's a bug that this works. The definition of the float builtin says
>>
>> Convert a string or a number to floating point. If the argument is a
>> string, it must contain a possibly signed decimal or floating point
>> number, possibly embedded in whitespace. The argument may also be
>> '[+|-]nan' or '[+|-]inf'.
>>
>> Now, one may wonder what precisely a "possibly signed floating point
>> number" is, but most likely, this refers to
>>
>> floatnumber   ::=  pointfloat | exponentfloat
>> pointfloat    ::=  [intpart] fraction | intpart "."
>> exponentfloat ::=  (intpart | pointfloat) exponent
>> intpart       ::=  digit+
>> fraction      ::=  "." digit+
>> exponent      ::=  ("e" | "E") ["+" | "-"] digit+
>> digit          ::=  "0"..."9"
>
> I don't see why the language spec should limit the wealth of number
> formats supported by float().
>
> It is not uncommon for Asians and other non-Latin script users to
> use their own native script symbols for numbers. Just because these
> digits may look strange to someone doesn't mean that they are
> meaningless or should be discarded.

That's different. Python doesn't assign any semantic meaning to the
characters in identifiers. The non-latin support for numerals, though,
could change the meaning of a program dramatically and needs to be
well-specified. Whether int() should do this is debatable. I, for one,
think this kind of support belongs in the locale module.



-- 
Regards,
Benjamin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 384 final review

I have now completed

http://www.python.org/dev/peps/pep-0384/

Benjamin has volunteered to rule on this PEP.

Please comment with any changes you want to see, or speak in
favor or against this PEP.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database


On 28/11/2010 23:33, "Martin v. Löwis" wrote:

float('١٢٣٤.٥٦')

1234.56

Even if this is somehow an accident or something that someone snuck in,
I think it a good idea that *users* be able to input amounts with their
native digits. That is different from requiring *programmers* to write
literals with euro-ascii-digits

So one question is what kind of data float() is aimed at. I claim that
it is about "programmer" data, not "user" data. If it supported "user"
data, it probably would have to support "1,000" to denote 1e3 in the
U.S., and denote 1e0 in Germany. Our users are generally confused
on whether they should use th full stop or the comma as the decimal
separator.

FWIW the C# equivalent is locale aware *unless* you pass in a specific 
culture.

(System.Double.Parse):

http://msdn.microsoft.com/en-us/library/fd84bdyt.aspx

If you're not aware that your code may be run on non-US computers this 
is a trap for the unwary. If you *are* aware then it is very useful.


An alternative overload allows you to specify the culture used to do the 
conversion:


http://msdn.microsoft.com/en-us/library/t9ebt447.aspx

Michael


As not even the locale-dependent issues are considered in float(),
it is clear to me that entering local numbers cannot possibly be
the objective of the function.

Instead, following a wide-spread Python convention, it is meant to be
the reverse of repr().

Can you name a single person who actually wants to write '١٢٣٤.٥٦'
as a number? I'm fairly skeptical that users of arabic-indic digits.
Instead,

http://en.wikipedia.org/wiki/Decimal_separator

suggests that they would rather U+066B, i.e. '١٢٣٤٫٥٦', which isn't
supported by Python.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. Löwis"  wrote:
..
> No no no. Addition of Unicode identifiers has a well-designed,
> deliberate specification, with a PEP and all. The support for
> non-ASCII digits in float appears to be ad-hoc, and not founded
> on actual needs of actual users.
>

I wonder how carefully right-to-left scripts were considered when PEP
3131 was discussed.
Try the following on the python prompt:

>>> ڦ= int('١٢٣')
>>> ڦ
123

In my OSX Terminal window, entering ڦ flips the >>> prompt and the
session looks like this:

('???')int = ? <<<
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

> FWIW the C# equivalent is locale aware *unless* you pass in a specific
> culture.
> (System.Double.Parse):

That's not quite the equivalent of float(), I would say: this one
apparently is locale-aware, so it is more the equivalent of locale.atof.

The next question then is if it supports indo-arabic digits in any
locale (or more specifically in an arabic locale).

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, 28 Nov 2010 17:23:01 -0600
Benjamin Peterson  wrote:
> 2010/11/28 M.-A. Lemburg :
> >
> >
> > "Martin v. Löwis" wrote:
> >>> float('١٢٣٤.٥٦')
>  1234.56
> >>
> >> I think it's a bug that this works. The definition of the float builtin 
> >> says
> >>
> >> Convert a string or a number to floating point. If the argument is a
> >> string, it must contain a possibly signed decimal or floating point
> >> number, possibly embedded in whitespace. The argument may also be
> >> '[+|-]nan' or '[+|-]inf'.
> >>
> >> Now, one may wonder what precisely a "possibly signed floating point
> >> number" is, but most likely, this refers to
> >>
> >> floatnumber   ::=  pointfloat | exponentfloat
> >> pointfloat    ::=  [intpart] fraction | intpart "."
> >> exponentfloat ::=  (intpart | pointfloat) exponent
> >> intpart       ::=  digit+
> >> fraction      ::=  "." digit+
> >> exponent      ::=  ("e" | "E") ["+" | "-"] digit+
> >> digit          ::=  "0"..."9"
> >
> > I don't see why the language spec should limit the wealth of number
> > formats supported by float().
> >
> > It is not uncommon for Asians and other non-Latin script users to
> > use their own native script symbols for numbers. Just because these
> > digits may look strange to someone doesn't mean that they are
> > meaningless or should be discarded.
> 
> That's different. Python doesn't assign any semantic meaning to the
> characters in identifiers. The non-latin support for numerals, though,
> could change the meaning of a program dramatically and needs to be
> well-specified. Whether int() should do this is debatable.

Perhaps int(), float(), Decimal() and friends could take an optional
parameter indicating whether non-ascii digits are considered. It would
then satisfy all parties.

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

Am 29.11.2010 00:56, schrieb Alexander Belopolsky:
> On Sun, Nov 28, 2010 at 6:03 PM, "Martin v. Löwis"  wrote:
> ..
>> No no no. Addition of Unicode identifiers has a well-designed,
>> deliberate specification, with a PEP and all. The support for
>> non-ASCII digits in float appears to be ad-hoc, and not founded
>> on actual needs of actual users.
>>
> 
> I wonder how carefully right-to-left scripts were considered when PEP
> 3131 was discussed.

IIRC, some Hebrew users have spoken in favor of the PEP, despite the
obvious difficulties it would create. I may misremember, but I think
someone pointed out that they had these difficulties all the time,
and that it wasn't really a burden.

Unicode specifies that one should always use "logical order" in memory,
and that's what the PEP does. Rendering is then a tool issue.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 6:59 PM, "Martin v. Löwis"  wrote:
..
> The next question then is if it supports indo-arabic digits in any
> locale (or more specifically in an arabic locale).

And once you answered that question, does it support Devanagari or
Bengali digits?  And if so, an arbitrary mix of those and indo-arabic
digits?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou  wrote:
..
>> That's different. Python doesn't assign any semantic meaning to the
>> characters in identifiers. The non-latin support for numerals, though,
>> could change the meaning of a program dramatically and needs to be
>> well-specified. Whether int() should do this is debatable.
>
> Perhaps int(), float(), Decimal() and friends could take an optional
> parameter indicating whether non-ascii digits are considered. It would
> then satisfy all parties.

What parties?  I don't think anyone has claimed to actually have used
non-ASCII digits with float(). Of course it is fun that Python can
process Bengali numerals, but so would be allowing Roman numerals.
There is a reason why after careful consideration, PEP 313 was
ultimately rejected.

BTW, it is common in Russia to specify months using roman numerals.
Maybe we should consider allowing datetime.date() accept '1.IV.2011'.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On 28/11/2010 23:59, "Martin v. Löwis" wrote:

FWIW the C# equivalent is locale aware *unless* you pass in a specific
culture.
(System.Double.Parse):

That's not quite the equivalent of float(), I would say: this one
apparently is locale-aware, so it is more the equivalent of locale.atof.

Right. It is *the* standard way of getting a float from a string though,
whereas in Python we have two depending on whether or not you want to be
locale aware. The standard way in C# is locale aware. To be non-locale
aware you pass in a specific culture or number format.

The next question then is if it supports indo-arabic digits in any
locale (or more specifically in an arabic locale).

I don't think so actually. The float parse formatting rules are defined
like this:

[ws][$][sign][integral-digits[,]]integral-digits[.[fractional-digits]][E[sign]exponential-digits][ws]

(From http://msdn.microsoft.com/en-us/library/7yd1h1be.aspx )

integral-digits, fractional-digits and exponential-digits are all
defined as "A series of digits ranging from 0 to 9". Arguably this is
not be conclusive. In fact the NumberFormatInfo class seems to hint that
it may be otherwise:

http://msdn.microsoft.com/en-us/library/system.globalization.numberformatinfo.aspx

See DigitSubstitution on that page. I would have to try it to be sure
and I don't have a Windows VM in convenient reach right now.

All the best,

Michael

Regards,
Martin

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database


On 29/11/2010 00:04, Alexander Belopolsky wrote:

On Sun, Nov 28, 2010 at 6:59 PM, "Martin v. Löwis"  wrote:
..

The next question then is if it supports indo-arabic digits in any
locale (or more specifically in an arabic locale).

And once you answered that question, does it support Devanagari or
Bengali digits?  And if so, an arbitrary mix of those and indo-arabic
digits?

Haha. Go and try it yourself. :-)

Michael

--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database


> > Perhaps int(), float(), Decimal() and friends could take an optional
> > parameter indicating whether non-ascii digits are considered. It would
> > then satisfy all parties.
> 
> What parties?  I don't think anyone has claimed to actually have used
> non-ASCII digits with float().

Have you done a poll of all Python 3 users?

> Of course it is fun that Python can
> process Bengali numerals, but so would be allowing Roman numerals.
> There is a reason why after careful consideration, PEP 313 was
> ultimately rejected.

That's mostly irrelevant. This feature exists and someone, somewhere,
may be using it. We normally don't remove stuff without deprecation.

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib

2010-11-28 Thread Nick Coghlan

On Mon, Nov 29, 2010 at 2:28 AM, Michael Foord
 wrote:
> For wrapping mutable types I'm tempted to say YAGNI. For the standard
> library wrapping integers meets almost all our use-cases except for one
> float. (At work we have a decimal constant as it happens.) Perhaps we could
> require immutable types for groups but allow arbitrary values for individual
> named values?

Whereas my opinion is that "immutable vs mutable" is such a blurry
distinction that we shouldn't try to make it at the lowest level.
Would it be possible to name frozenset instances? Tuples? How about
objects that are conceptually immutable, but don't close all the
loopholes allowing you to mutate them? (e.g. Decimal, Fraction)

Better to design a named value API that doesn't care about mutability,
and then leave questions of reverse mappings from values back to names
to the grouping API level. At that level, it would be trivial (and
natural) to limit names to referencing Hashable values so that a
reverse lookup table would be easy to implement. For standard library
purposes, we could even reasonably provide an int-only grouping API,
since the main use case is almost certainly to be in managing
translation of OS-level integer constants to named values.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Ben Finney

Alexander Belopolsky  writes:

> On Sun, Nov 28, 2010 at 7:01 PM, Antoine Pitrou  wrote:
> > Perhaps int(), float(), Decimal() and friends could take an optional
> > parameter indicating whether non-ascii digits are considered. It
> > would then satisfy all parties.
>
> What parties? I don't think anyone has claimed to actually have used
> non-ASCII digits with float().

Rather, it has been pointed out that there is an unknown amount of
existing code which does that. You're not going to know how much or how
little from this forum.

> Of course it is fun that Python can process Bengali numerals, but so
> would be allowing Roman numerals. There is a reason why after careful
> consideration, PEP 313 was ultimately rejected.

Rejecting a proposed *new* capability is a different matter from
disabling an *existing* capability which works in existing Python
releases.

-- 
 \   “Following fashion and the status quo is easy. Thinking about |
  `\your users' lives and creating something practical is much |
_o__)harder.” —Ryan Singer, 2008-07-09 |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib


On 29/11/2010 00:48, Nick Coghlan wrote:

On Mon, Nov 29, 2010 at 2:28 AM, Michael Foord
  wrote:

For wrapping mutable types I'm tempted to say YAGNI. For the standard
library wrapping integers meets almost all our use-cases except for one
float. (At work we have a decimal constant as it happens.) Perhaps we could
require immutable types for groups but allow arbitrary values for individual
named values?

Whereas my opinion is that "immutable vs mutable" is such a blurry
distinction that we shouldn't try to make it at the lowest level.
Would it be possible to name frozenset instances? Tuples? How about
objects that are conceptually immutable, but don't close all the
loopholes allowing you to mutate them? (e.g. Decimal, Fraction)

Better to design a named value API that doesn't care about mutability,
and then leave questions of reverse mappings from values back to names
to the grouping API level. At that level, it would be trivial (and
natural) to limit names to referencing Hashable values so that a
reverse lookup table would be easy to implement. For standard library
purposes, we could even reasonably provide an int-only grouping API,
since the main use case is almost certainly to be in managing
translation of OS-level integer constants to named values.


Sounds reasonable to me.

Michael


Cheers,
Nick.




--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 384 final review

2010-11-28 Thread Terry Reedy


On 11/28/2010 6:40 PM, "Martin v. Löwis" wrote:

I have now completed

http://www.python.org/dev/peps/pep-0384/


The current text contains several error messages like:

"System Message: WARNING/2 (pep-0384.txt, line 194)
Bullet list ends without a blank line; unexpected unindent."

Terry Jan Reedy


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Steven D'Aprano


Martin v. Löwis wrote:

float('١٢٣٤.٥٦')

1234.56


I think it's a bug that this works. The definition of the float builtin says

[...]

I think that's a documentation bug rather than a coding bug. If Python 
wishes to limit the digits allowed in numeric *literals* to ASCII 0...9, 
that's one thing, but I think that the digits allowed in numeric 
*strings* should allow the full range of digits supported by the Unicode 
standard.


The former ensures that literals in code are always readable; the later 
allows users to enter numbers in their own number system. How could that 
be a bad thing?




--
Steven

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib

2010-11-28 Thread Rob Cliffe




On 28/11/2010 21:23, Greg Ewing wrote:

Rob Cliffe wrote:

But couldn't they be presented to the Python programmer as a single 
type, with the implementation details hidden "under the hood"?


Not in CPython, because tuple items are kept in the same block
of memory as the object header. Because CPython can't move
objects, this means that the size of the tuple must be known
when the object is created.

But when a frozen list a.k.a. tuple would be created - either directly, 
or by setting a list's mutable flag to False which would really turn it 
into a tuple - the size *would* be known.  And since the object would 
now be immutable, there would be no requirement for its size to change.  
(My idea doesn't require additional functionality, just a different API.)

Rob Cliffe
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 7:55 PM, Ben Finney  wrote:
..
>> Of course it is fun that Python can process Bengali numerals, but so
>> would be allowing Roman numerals. There is a reason why after careful
>> consideration, PEP 313 was ultimately rejected.
>
> Rejecting a proposed *new* capability is a different matter from
> disabling an *existing* capability which works in existing Python
> releases.

Was this capability ever documented?  It does not feel like a
deliberate feature.  If it was, '\N{ARABIC DECIMAL SEPARATOR}' would
be accepted in arabic-indic notation.   If feels more like a CPython
implementation detail similar to say:

>>> int('10') is 10
True
>>> int('1') is 1
False

Note that the underlying PyUnicode_EncodeDecimal() function is
described in the unicodeobject.h header file as follows:

/* --- Decimal Encoder  */

/* Takes a Unicode string holding a decimal value and writes it into
   an output buffer using standard ASCII digit codes.
  ..
  The encoder converts whitespace to ' ', decimal characters to their
   corresponding ASCII digit and all other Latin-1 characters except
   \0 as-is. Characters outside this range (Unicode ordinals 1-256)
   are treated as errors. This includes embedded NULL bytes.
 */

So the support for non-ASCII digits is accidental and should be
treated as a bug.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Ben Finney

Steven D'Aprano  writes:

> If Python wishes to limit the digits allowed in numeric *literals* to
> ASCII 0...9, that's one thing, but I think that the digits allowed in
> numeric *strings* should allow the full range of digits supported by
> the Unicode standard.

I assume you specifically mean that the numeric class constructors, like
‘int’ and ‘float’, should parse their input string such that any
character Unicode defines as a numeric digit is mapped to the
corresponding digit.

That sounds attractive, but it raises questions about mixed notations,
mixing digits from different writing systems, and probably other
questionss I haven't thought of. It's not something to make a simple
yes-or-no-decision on now, IMO.

This sounds best suited to a PEP, which someone who cares enough can
champion in ‘python-ideas’.

-- 
 \  “The manager has personally passed all the water served here.” |
  `\  —hotel, Acapulco |
_o__)  |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Steven D'Aprano


Alexander Belopolsky wrote:

Two recently reported issues brought into light the fact that Python
language definition is closely tied to character properties maintained
by the Unicode Consortium. [1,2]  For example, when Python switches to
Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two
additional characters that Python can use in identifiers. [3]

[...]

Why do you consider this a problem? It would be a problem if previously 
valid identifiers *stopped* being valid, but not the other way around.




Of course, the likelihood is low that this change will affect any
user, but the change in str.isspace() reported in [1] is likely to
cause some trouble:


Looking at the thread here:
http://bugs.python.org/issue10567

I interpret it as indicting that Python's isspace() has been buggy for 
many years, and is only now being fixed. It's always unfortunate when 
people rely on bugs, but I'm not sure we should be promising to support 
bug-for-bug compatibility from one version to the next :)




While we have little choice but to follow UCD in defining
str.isidentifier(), I think Python can promise users more stability in
what it treats as space or as a digit in its builtins.   For example,
I don't think that supporting


float('١٢٣٤.٥٦')

1234.56

is more important than to assure users that once their program
accepted some text as a number, they can assume that the text is
ASCII.


Seems like a pretty foolish assumption, if you ask me, pretty much akin 
to assuming that if string.isalpha() returns true that string is ASCII.


Support for non-Arabic numerals in number strings goes back to at least 
Python 2.4:


[st...@sylar ~]$ python2.4
Python 2.4.6 (#1, Mar 30 2009, 10:08:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> float(u'١٢٣٤.٥٦')
1234.55999


The fact that this is (apparently) only being raised now means that it 
isn't actually a problem in real life. I'd even say that it's a feature, 
and that if Python didn't support non-Arabic numerals, it should.




--
Steven

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano  wrote:
..
>> is more important than to assure users that once their program
>> accepted some text as a number, they can assume that the text is
>> ASCII.
>
> Seems like a pretty foolish assumption, if you ask me, pretty much akin to
> assuming that if string.isalpha() returns true that string is ASCII.
>

It is not to 99.9% of Python users whose code is written for 2.x.
Their strings are byte strings and string.isdigit() does imply ASCII
even if string.isalpha() does not in many locales.

..
> The fact that this is (apparently) only being raised now means that it isn't
> actually a problem in real life. I'd even say that it's a feature, and that
> if Python didn't support non-Arabic numerals, it should.
>

I raised this problem because I found a bug that is related to this
feature.  The bug is also a regression from 2.x.

In 2.7:

>>> float(u'1234\xa1')
..
ValueError: invalid literal for float(): 1234?

The last character is lost, but the error message is still meaningful.

In 3.x, however:

>>> float('1234\xa1')
..
ValueError

See http://bugs.python.org/issue10557

While investigating this issue I found that by the time the string
gets to the number parser (_Py_dg_strtod), all non-ascii characters
are dropped by PyUnicode_EncodeDecimal() so it cannot produce
meaningful diagnostic.

Of course, PyUnicode_EncodeDecimal(), can be fixed by making it pass
non-ascii chars through as UTF-8 bytes, but I was wondering if
preserving the ability to parse exotic numerals was worth the effort.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] constant/enum type in stdlib

2010-11-28 Thread Ron Adam




On 11/27/2010 04:51 AM, Nick Coghlan wrote:

x = named_value("FOO", 1)
y = named_value("BAR", "Hello World!")
z = named_value("BAZ", dict(a=1, b=2, c=3))

print(x, y, z, sep="\n")
print("\n".join(map(repr, (x, y, z
print("\n".join(map(str, map(type, (x, y, z)

set_named_values(globals(), foo=x._raw(), bar=y._raw(), baz=z._raw())
print("\n".join(map(repr, (foo, bar, baz
print(type(x) is type(foo), type(y) is type(bar), type(z) is type(baz))

==

# Session output for the last 6 lines

>>>  print(x, y, z, sep="\n")

1
Hello World!
{'a': 1, 'c': 3, 'b': 2}


>>>  print("\n".join(map(repr, (x, y, z

FOO=1
BAR='Hello World!'
BAZ={'a': 1, 'c': 3, 'b': 2}


This reminds me of python annotations.  Which seem like an already 
forgotten new feature.  Maybe they can help with this?



It does associate additional info to names and creates a nice dictionary to 
reference.



>>> def name_values( FOO: 1,
 BAR: "Hello World!",
 BAZ: dict(a=1, b=2, c=3) ):
...   return FOO, BAR, BAZ
...
>>> foo(1,2,3)
(1, 2, 3)
>>> foo.__annotations__
{'BAR': 'Hello World!', 'FOO': 1, 'BAZ': {'a': 1, 'c': 3, 'b': 2}}


Cheers,
  Ron










___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Stephen J. Turnbull

M.-A. Lemburg writes:

 > It is not uncommon for Asians and other non-Latin script users to
 > use their own native script symbols for numbers.

Japanese don't, in computational or scientific work where float()
would be used.  Japanese numerals are used for dates and for certain
felicitous ages (and even there so-called "Arabic" numerals are
perfectly acceptable).  Otherwise, it's all ASCII (although it might
be "full-width" compatibility variants).

 > Please also remember that Python3 now allows Unicode names for
 > identifiers for much the same reasons.

I don't think it's the same reason, not for Japanese, anyway.

I agree that Python should make it easy for the programmer to get
numerical values of native numeric strings, but it's not at all clear
to me that there is any point to having float() recognize them by
default.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

2010-11-28 Thread Nick Coghlan

On Mon, Nov 29, 2010 at 1:39 PM, Stephen J. Turnbull  wrote:
> I agree that Python should make it easy for the programmer to get
> numerical values of native numeric strings, but it's not at all clear
> to me that there is any point to having float() recognize them by
> default.

Indeed, as someone else suggested earlier in the thread, supporting
non-ASCII digits sounds more like a job for the locale module than for
the builtin types.

Deprecating non-ASCII support in the latter, while ensuring it is
properly supported in the former sounds like a better way forward than
maintaining the status quo (starting in 3.3 though, with the first
beta just around the corner, we don't want to be monkeying with this
in 3.2)

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

> Perhaps int(), float(), Decimal() and friends could take an optional
> parameter indicating whether non-ascii digits are considered. It would
> then satisfy all parties.

Not really. I still would want to see what the actual requirement is:
i.e. do any users actually have the desire to have these digits
accepted, yet the alternative decimal points rejected?

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database

> The former ensures that literals in code are always readable; the later
> allows users to enter numbers in their own number system. How could that
> be a bad thing?

It's YAGNI, feature bloat. It gives the illusion of supporting something
that actually isn't supported very well (namely, parsing local number
strings). I claim that there is no meaningful application
of this feature.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python and the Unicode Character Database