Re: Chardet, file, ... and the Flexible String Representation

2013-09-11 Thread Serhiy Storchaka
09.09.13 22:27, random...@fastmail.us написав(ла): On Mon, Sep 9, 2013, at 15:03, Ian Kelly wrote: Do you mean that it breaks when overwriting Python string object buffers, or when overwriting arbitrary C strings either received from C code or created with create_unicode_buffer? If the former,

Re: Chardet, file, ... and the Flexible String Representation

2013-09-10 Thread random832
On Mon, Sep 9, 2013, at 10:28, wxjmfa...@gmail.com wrote: *time performance differences* > > Comment: Such differences never happen with utf. Why is this bad? Keeping in mind that otherwise they would all be almost as slow as the UCS-4 case. > >>> sys.getsizeof('a') > 26 > >>> sys.getsizeof('€')

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread Steven D'Aprano
On Mon, 09 Sep 2013 11:05:44 -0600, Michael Torrie wrote: > On 09/09/2013 08:28 AM, wxjmfa...@gmail.com wrote: >> Comment: Such differences never happen with utf. > > But with utf, slicing strings is O(n) (well that's a simplification as > someone showed an algorithm that is log n), whereas a fix

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread Terry Reedy
On 9/9/2013 12:38 PM, Ned Batchelder wrote: jmf, thanks for your reply. You've calmed my fears that there is something wrong with the Flexible String Representation. None of the examples you show demonstrate any behavior contrary to the Unicode spec. The goals of the new unicode implementati

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread random832
On Mon, Sep 9, 2013, at 15:03, Ian Kelly wrote: > Do you mean that it breaks when overwriting Python string object buffers, > or when overwriting arbitrary C strings either received from C code or > created with create_unicode_buffer? > > If the former, I think that is to be expected since ctypes

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread random832
On Fri, Sep 6, 2013, at 13:04, Chris Angelico wrote: > On Sat, Sep 7, 2013 at 2:59 AM, wrote: > > Incidentally, how does all this interact with ctypes unicode_buffers, > > which slice as strings and must be UTF-16 on windows? This was fine > > pre-FSR when unicode objects were UTF-16, but I'm not

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread Ian Kelly
On Sep 9, 2013 12:36 PM, wrote: > > On Fri, Sep 6, 2013, at 13:04, Chris Angelico wrote: > > On Sat, Sep 7, 2013 at 2:59 AM, wrote: > > > Incidentally, how does all this interact with ctypes unicode_buffers, > > > which slice as strings and must be UTF-16 on windows? This was fine > > > pre-FSR

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread Ned Batchelder
On 9/9/13 10:28 AM, wxjmfa...@gmail.com wrote: Le vendredi 6 septembre 2013 17:46:14 UTC+2, Piet van Oostrum a écrit : wxjmfa...@gmail.com writes: The Flexible String Representation has conceptually to face the same problem. It splits "unicode" in chunks and it has to solve two problems at t

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread Michael Torrie
On 09/09/2013 08:28 AM, wxjmfa...@gmail.com wrote: > Comment: Such differences never happen with utf. But with utf, slicing strings is O(n) (well that's a simplification as someone showed an algorithm that is log n), whereas a fixed-width encoding (Latin-1, UCS-2, UCS-4) is O(1). Do you understan

Re: Chardet, file, ... and the Flexible String Representation

2013-09-09 Thread wxjmfauth
Le vendredi 6 septembre 2013 17:46:14 UTC+2, Piet van Oostrum a écrit : > wxjmfa...@gmail.com writes: > > > > > The Flexible String Representation has conceptually to > > > face the same problem. It splits "unicode" in chunks and > > > it has to solve two problems at the same time, the coding

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread Chris Angelico
On Sat, Sep 7, 2013 at 1:46 AM, Piet van Oostrum wrote: > The FSR simply stores a Unicode string as an array[*] of ints (the Unicode > code points of the characters of the string. That's it. Then it uses a > memory-efficient way to store this array of ints. But that has nothing to do > with cha

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread Chris Angelico
On Sat, Sep 7, 2013 at 2:59 AM, wrote: > Incidentally, how does all this interact with ctypes unicode_buffers, > which slice as strings and must be UTF-16 on windows? This was fine > pre-FSR when unicode objects were UTF-16, but I'm not sure how it would > work now. That would be pre-FSR *with a

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread Piet van Oostrum
wxjmfa...@gmail.com writes: > The Flexible String Representation has conceptually to > face the same problem. It splits "unicode" in chunks and > it has to solve two problems at the same time, the coding > and the handling of multiple "char sets". The problem? > It fails. > "This poor Flexible Str

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread random832
On Fri, Sep 6, 2013, at 11:46, Piet van Oostrum wrote: > The FSR does not split unicode in chuncks. It does not create problems > and therefore it doesn't have to solve this. > > The FSR simply stores a Unicode string as an array[*] of ints (the > Unicode code points of the characters of the stri

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread Ned Batchelder
On 9/6/13 5:11 AM, wxjmfa...@gmail.com wrote: The Flexible String Representation has conceptually to face the same problem. It splits "unicode" in chunks and it has to solve two problems at the same time, the coding and the handling of multiple "char sets". The problem? It fails. Just once, ple

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread Antoon Pardon
Op 06-09-13 11:11, wxjmfa...@gmail.com schreef: > > The Flexible String Representation has conceptually to > face the same problem. It splits "unicode" in chunks and > it has to solve two problems at the same time, the coding > and the handling of multiple "char sets". The problem? Not true. The

Re: Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread Steven D'Aprano
On Fri, 06 Sep 2013 02:11:56 -0700, wxjmfauth wrote: > Short comment about the "detection" tools from a previous discussion. > > The tools supposed to detect the coding scheme are all working with a > simple logical mathematical rule: > > p ==> q<==> non q ==> non p . Incorrect. charde

Chardet, file, ... and the Flexible String Representation

2013-09-06 Thread wxjmfauth
Short comment about the "detection" tools from a previous discussion. The tools supposed to detect the coding scheme are all working with a simple logical mathematical rule: p ==> q<==> non q ==> non p . Shortly -- and consequence -- they do not detect a coding scheme they only detect