Re: New internal string format in 3.3

2012-08-21 Thread Roy Smith
In article , Michael Torrie wrote: > > And if you want the "fudge it somehow" behavior (which is often very > > useful!), there's always http://pypi.python.org/pypi/Unidecode/ > > Sweet tip, thanks! I often want to process text that has smart quotes, > emdashes, etc, and convert them to plain

Re: New internal string format in 3.3

2012-08-20 Thread Michael Torrie
On 08/20/2012 07:17 AM, Roy Smith wrote: > In article , > Michael Torrie wrote: > >> Python generally tries to follow unicode >> encoding rules to the letter. Thus if a piece of text cannot be >> represented in the character set of the terminal, then Python will >> properly err out. Other lang

Re: New internal string format in 3.3

2012-08-20 Thread Roy Smith
In article , Michael Torrie wrote: > Python generally tries to follow unicode > encoding rules to the letter. Thus if a piece of text cannot be > represented in the character set of the terminal, then Python will > properly err out. Other languages you have tried, likely fudge it > somehow.

Re: New internal string format in 3.3

2012-08-19 Thread Michael Torrie
On 08/19/2012 11:51 AM, wxjmfa...@gmail.com wrote: > Five minutes after a closed my interactive interpreters windows, > the day I tested this stuff. I though: > "Too bad I did not noted the extremely bad cases I found, I'm pretty > sure, this problem will arrive on the table". Reading through this

Re: New internal string format in 3.3

2012-08-19 Thread Chris Angelico
On Mon, Aug 20, 2012 at 4:09 AM, Mark Lawrence wrote: > On 19/08/2012 18:51, wxjmfa...@gmail.com wrote: >> >> Just for the story. >> >> Five minutes after a closed my interactive interpreters windows, >> the day I tested this stuff. I though: >> "Too bad I did not noted the extremely bad cases I f

Re: New internal string format in 3.3

2012-08-19 Thread Mark Lawrence
On 19/08/2012 18:51, wxjmfa...@gmail.com wrote: Just for the story. Five minutes after a closed my interactive interpreters windows, the day I tested this stuff. I though: "Too bad I did not noted the extremely bad cases I found, I'm pretty sure, this problem will arrive on the table". jmf H

Re: New internal string format in 3.3

2012-08-19 Thread Terry Reedy
On 8/19/2012 8:59 AM, wxjmfa...@gmail.com wrote: In August 2012, after 20 years of development, Python is not able to display a piece of text correctly on a Windows console (eg cp65001). cp65001 is known to not work right. It has been very frustrating. Bug Microsoft about it, and indeed their

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Just for the story. Five minutes after a closed my interactive interpreters windows, the day I tested this stuff. I though: "Too bad I did not noted the extremely bad cases I found, I'm pretty sure, this problem will arrive on the table". jmf -- http://mail.python.org/mailman/listinfo/python-li

Re: New internal string format in 3.3

2012-08-19 Thread Terry Reedy
On 8/19/2012 10:09 AM, wxjmfa...@gmail.com wrote: I can not give you more numbers than those I gave. As a end user, I noticed and experimented my random tests are always slower in Py3.3 than in Py3.2 on my Windows platform. And I gave other examples where 3.3 is *faster* on my Windows, which y

Re: New internal string format in 3.3

2012-08-19 Thread Oscar Benjamin
On Aug 19, 2012 5:22 PM, wrote > > Py 3.2.3 > >>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')") > 4.99396356635981 > > Py 3.3b2 > >>> timeit.timeit("('aœ€'*100).replace('a', 'œ€é')") > 7.560455708007855 > > Maybe, not so demonstative. It shows at least, we > are far away from the 10-30% "annouc

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 16:48:48 UTC+2, Mark Lawrence a écrit : > On 19/08/2012 15:09, wxjmfa...@gmail.com wrote: > > > > > > > > I can not give you more numbers than those I gave. > > > As a end user, I noticed and experimented my random tests > > > are always slower in Py3.3 than in Py3.2

Re: New internal string format in 3.3

2012-08-19 Thread Mark Lawrence
On 19/08/2012 15:09, wxjmfa...@gmail.com wrote: I can not give you more numbers than those I gave. As a end user, I noticed and experimented my random tests are always slower in Py3.3 than in Py3.2 on my Windows platform. Once again you refuse to supply anything to back up what you say. It

Re: New internal string format in 3.3

2012-08-19 Thread Oscar Benjamin
On 19 August 2012 15:09, wrote: > I can not give you more numbers than those I gave. > As a end user, I noticed and experimented my random tests > are always slower in Py3.3 than in Py3.2 on my Windows platform. > Do the problems have a significant impact on any real application (rather than ran

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 15:46:34 UTC+2, Mark Lawrence a écrit : > On 19/08/2012 13:59, wxjmfa...@gmail.com wrote: > > > Le dimanche 19 ao�t 2012 14:29:17 UTC+2, Dave Angel a �crit : > > >> On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: > > >> > > >>> Le dimanche 19 ao�t 2012 12:26:44

Re: New internal string format in 3.3

2012-08-19 Thread Mark Lawrence
On 19/08/2012 13:59, wxjmfa...@gmail.com wrote: Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit : On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: Le dimanche 19 ao�t 2012 12:26:44 UTC+2, Chris Angelico a �crit : On Sun, Aug 19, 2012 at 8:19 PM, wrote: This is pre

Re: New internal string format in 3.3

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 03:19:23 -0700, wxjmfauth wrote: > This is precicely the weak point of this flexible representation. It > uses latin-1 and latin-1 is for most users simply unusable. That's very funny. Are you aware that your post is entirely Latin-1? > Fascinating, isn't it? Devs are devel

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit : > On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: > > > Le dimanche 19 ao�t 2012 12:26:44 UTC+2, Chris Angelico a �crit : > > >> On Sun, Aug 19, 2012 at 8:19 PM, wrote: > > >> > > >>> This is precicely the weak point of this

Re: New internal string format in 3.3

2012-08-19 Thread Dave Angel
(pardon the resend, but I accidentally omitted a couple of words) On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: > Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit : >> >> >> >> No, it uses Unicode, and as an optimization, attempts to store the >> codepoints in less than four by

Re: New internal string format in 3.3

2012-08-19 Thread Dave Angel
On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: > Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit : >> On Sun, Aug 19, 2012 at 8:19 PM, wrote: >> >>> This is precicely the weak point of this flexible >>> representation. It uses latin-1 and latin-1 is for >>> most users simply u

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit : > On Sun, Aug 19, 2012 at 8:19 PM, wrote: > > > This is precicely the weak point of this flexible > > > representation. It uses latin-1 and latin-1 is for > > > most users simply unusable. > > > > No, it uses Unicode, and as

Re: New internal string format in 3.3

2012-08-19 Thread Chris Angelico
On Sun, Aug 19, 2012 at 8:19 PM, wrote: > This is precicely the weak point of this flexible > representation. It uses latin-1 and latin-1 is for > most users simply unusable. No, it uses Unicode, and as an optimization, attempts to store the codepoints in less than four bytes for most strings. T

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 11:37:09 UTC+2, Peter Otten a écrit : You know, the techincal aspect is one thing. Understanding the coding of the characters as a whole is something else. The important point is not the coding per se, the relevant point is the set of characters a coding may represent. Y

Re: New internal string format in 3.3

2012-08-19 Thread Peter Otten
Steven D'Aprano wrote: > On Sun, 19 Aug 2012 09:43:13 +0200, Peter Otten wrote: > >> Steven D'Aprano wrote: > >>> I don't know where people are getting this myth that PEP 393 uses >>> Latin-1 internally, it does not. Read the PEP, it explicitly states >>> that 1-byte formats are only used for AS

Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 10:56:36 UTC+2, Steven D'Aprano a écrit : > > internal implementation, and strings which fit exactly in Latin-1 will > And this is the crucial point. latin-1 is an obsolete and non usable coding scheme (esp. for european languages). We fall on the point I mentionned ab

Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Steven D'Aprano
On Sun, 19 Aug 2012 09:43:13 +0200, Peter Otten wrote: > Steven D'Aprano wrote: >> I don't know where people are getting this myth that PEP 393 uses >> Latin-1 internally, it does not. Read the PEP, it explicitly states >> that 1-byte formats are only used for ASCII strings. > > From > > Python

New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread Peter Otten
Steven D'Aprano wrote: > On Sat, 18 Aug 2012 19:34:50 +0100, MRAB wrote: > >> "a" will be stored as 1 byte/codepoint. >> >> Adding "é", it will still be stored as 1 byte/codepoint. > > Wrong. It will be 2 bytes, just like it already is in Python 3.2. > > I don't know where people are getting t