subject:"unicode, bytes redux"

Re: unicode, bytes redux

2006-09-25 Thread Fredrik Lundh

willie wrote: > Is it too ridiculous to suggest that it'd be nice > if the unicode object were to remember the > encoding of the string it was decoded from? > So that it's feasible to calculate the number > of bytes that make up the unicode code points. > > # U+270C > # 11100010 10011100 10001100

Re: unicode, bytes redux

2006-09-25 Thread Walter Dörwald

Steven D'Aprano wrote: > On Mon, 25 Sep 2006 00:45:29 -0700, Paul Rubin wrote: > >> willie <[EMAIL PROTECTED]> writes: >>> # U+270C >>> # 11100010 10011100 10001100 >>> buf = "\xE2\x9C\x8C" >>> u = buf.decode('UTF-8') >>> # ... later ... >>> u.bytes() -> 3 >>> >>> (goes through each code point and

Re: unicode, bytes redux

2006-09-25 Thread Fredrik Lundh

John Machin wrote: >> > So all he needs is a boolean result: u.willitfit(encoding, width) >> >> at what point in the program would that method be used ? > > Never, I hope. Were you taking that as a serious suggestion? Fredrik, > perhaps your irony detector needs a little preventative maintenance :-

Re: unicode, bytes redux

2006-09-25 Thread John Machin

Fredrik Lundh wrote: > John Machin wrote: > > > Actually, what Willie was concerned about was some cockamamie DBMS > > which required to be fed Unicode, which it encoded as UTF-8, but > > silently truncated if it was more than the n in varchar(n) ... or > > something like that. > > > > So all he n

Re: unicode, bytes redux

2006-09-25 Thread John Roth

willie wrote: > (beating a dead horse) > > Is it too ridiculous to suggest that it'd be nice > if the unicode object were to remember the > encoding of the string it was decoded from? > So that it's feasible to calculate the number > of bytes that make up the unicode code points. > > # U+270C > #

Re: unicode, bytes redux

2006-09-25 Thread Fredrik Lundh

John Machin wrote: > Actually, what Willie was concerned about was some cockamamie DBMS > which required to be fed Unicode, which it encoded as UTF-8, but > silently truncated if it was more than the n in varchar(n) ... or > something like that. > > So all he needs is a boolean result: u.willitfit

Re: unicode, bytes redux

2006-09-25 Thread John Machin

Paul Rubin wrote: > Leif K-Brooks <[EMAIL PROTECTED]> writes: > > It requires a fairly large change to code and API for a relatively > > uncommon problem. How often do you need to know how many bytes an > > encoded Unicode string takes up without needing the encoded string > > itself? > > Shrug. I

Re: unicode, bytes redux

2006-09-25 Thread Steven D'Aprano

On Mon, 25 Sep 2006 00:45:29 -0700, Paul Rubin wrote: > willie <[EMAIL PROTECTED]> writes: >> # U+270C >> # 11100010 10011100 10001100 >> buf = "\xE2\x9C\x8C" >> u = buf.decode('UTF-8') >> # ... later ... >> u.bytes() -> 3 >> >> (goes through each code point and calculates >> the number of bytes

Re: unicode, bytes redux

2006-09-25 Thread John Machin

Paul Rubin wrote: > "John Machin" <[EMAIL PROTECTED]> writes: > > Actually, what Willie was concerned about was some cockamamie DBMS > > which required to be fed Unicode, which it encoded as UTF-8, > > Yeah, I remember that. > > > Tell you what, why don't you and Willie get together and write a PE

Re: unicode, bytes redux

2006-09-25 Thread Paul Rubin

"John Machin" <[EMAIL PROTECTED]> writes: > Actually, what Willie was concerned about was some cockamamie DBMS > which required to be fed Unicode, which it encoded as UTF-8, Yeah, I remember that. > Tell you what, why don't you and Willie get together and write a PEP? If enough people care about

Re: unicode, bytes redux

2006-09-25 Thread John Machin

willie wrote: > (beating a dead horse) > > Is it too ridiculous to suggest that it'd be nice > if the unicode object were to remember the > encoding of the string it was decoded from? Where it's been is irrelevant. Where it's going to is what matters. > So that it's feasible to calculate the numb

Re: unicode, bytes redux

2006-09-25 Thread Paul Rubin

Leif K-Brooks <[EMAIL PROTECTED]> writes: > It requires a fairly large change to code and API for a relatively > uncommon problem. How often do you need to know how many bytes an > encoded Unicode string takes up without needing the encoded string > itself? Shrug. I don't see a real large change--

Re: unicode, bytes redux

2006-09-25 Thread Leif K-Brooks

Paul Rubin wrote: > Duncan Booth explains why that doesn't work. But I don't see any big > problem with a byte count function that lets you specify an encoding: > > u = buf.decode('UTF-8') > # ... later ... > u.bytes('UTF-8') -> 3 > u.bytes('UCS-4') -> 4 > > That avoids creat

Re: unicode, bytes redux

2006-09-25 Thread Paul Rubin

willie <[EMAIL PROTECTED]> writes: > # U+270C > # 11100010 10011100 10001100 > buf = "\xE2\x9C\x8C" > u = buf.decode('UTF-8') > # ... later ... > u.bytes() -> 3 > > (goes through each code point and calculates > the number of bytes that make up the character > according to the encoding) Duncan Bo

Re: unicode, bytes redux

2006-09-25 Thread Duncan Booth

willie <[EMAIL PROTECTED]> wrote: > Is it too ridiculous to suggest that it'd be nice > if the unicode object were to remember the > encoding of the string it was decoded from? > So that it's feasible to calculate the number > of bytes that make up the unicode code points. So what sort of output

Re: unicode, bytes redux

2006-09-24 Thread Robert Kern

willie wrote: > (beating a dead horse) > > Is it too ridiculous to suggest that it'd be nice > if the unicode object were to remember the > encoding of the string it was decoded from? Yes. The unicode object itself is precisely the wrong place for that kind of information. Many (most?) unicode o

unicode, bytes redux

2006-09-24 Thread willie

(beating a dead horse) Is it too ridiculous to suggest that it'd be nice if the unicode object were to remember the encoding of the string it was decoded from? So that it's feasible to calculate the number of bytes that make up the unicode code points. # U+270C # 11100010 10011100 10001100 buf =

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

Re: unicode, bytes redux

unicode, bytes redux

17 matches

Site Navigation

Mail list logo

Footer information