Re: Surrogate pairs in new flexible string representation

2013-03-29 Thread Christian Heimes
Am 29.03.2013 07:22, schrieb Ian Kelly: > Since the PEP specifically mentions ParseTuple string conversion, I am > thinking that this is probably the motivation for caching it. A > string that is passed into a C function (that uses one of the various > UTF-8 char* format specifiers) is perhaps lik

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-29 Thread Terry Reedy
On 3/28/2013 10:37 PM, Steven D'Aprano wrote: Under what circumstances will a string be created from a wchar_t string? How, and why, would such a string be created? Why would Python still support strings containing surrogates when it now has a nice, shiny, surrogate-free flexible representation?

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Ian Kelly
On Fri, Mar 29, 2013 at 12:11 AM, Ian Kelly wrote: > From the PEP: > > """ > A new function PyUnicode_AsUTF8 is provided to access the UTF-8 > representation. It is thus identical to the existing > _PyUnicode_AsString, which is removed. The function will compute the > utf8 representation when firs

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Ian Kelly
On Thu, Mar 28, 2013 at 8:37 PM, Steven D'Aprano wrote: >>> I also wonder why the implementation bothers keeping a UTF-8 >>> representation. That sounds like premature optimization to me. Surely >>> you only need it when writing to a file with UTF-8 encoding? For most >>> strings, that will never

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Chris Angelico
On Fri, Mar 29, 2013 at 1:37 PM, Steven D'Aprano wrote: > Under what circumstances will a string be created from a wchar_t string? > How, and why, would such a string be created? Why would Python still > support strings containing surrogates when it now has a nice, shiny, > surrogate-free flexible

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Steven D'Aprano
On Fri, 29 Mar 2013 11:54:41 +1100, Chris Angelico wrote: > On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano > wrote: >> ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only >> strings. It's only strings in the SMPs that could need surrogate pairs, >> and they don't need them in

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread MRAB
On 29/03/2013 00:54, Chris Angelico wrote: On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano wrote: ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only strings. It's only strings in the SMPs that could need surrogate pairs, and they don't need them in Python's implementation s

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Chris Angelico
On Fri, Mar 29, 2013 at 12:03 PM, Mark Lawrence wrote: > On 29/03/2013 00:54, Chris Angelico wrote: >> Minor nitpick, btw: >>> >>> (in which cast wstr_length differs form length) >> >> Should be "in which case" and "from". Who has the power to correct >> typos in PEPs? > > Sneak it in here? http:/

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Mark Lawrence
On 29/03/2013 00:54, Chris Angelico wrote: Minor nitpick, btw: (in which cast wstr_length differs form length) Should be "in which case" and "from". Who has the power to correct typos in PEPs? ChrisA Sneak it in here? http://bugs.python.org/issue13604 -- If you're using GoogleCrap™ please

Re: Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Chris Angelico
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano wrote: > ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only > strings. It's only strings in the SMPs that could need surrogate pairs, > and they don't need them in Python's implementation since it's a full 32- > bit implementatio

Surrogate pairs in new flexible string representation [was Re: flaming vs accuracy [was Re: Performance of int/long in Python 3]]

2013-03-28 Thread Steven D'Aprano
On Thu, 28 Mar 2013 10:11:59 -0600, Ian Kelly wrote: > On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico > wrote: >> PEP393 strings have two optimizations, or kinda three: >> >> 1a) ASCII-only strings >> 1b) Latin1-only strings >> 2) BMP-only strings >> 3) Everything else >> >> Options 1a and 1b ar