willie wrote:
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
>
> # U+270C
> # 11100010 10011100 10001100
Steven D'Aprano wrote:
> On Mon, 25 Sep 2006 00:45:29 -0700, Paul Rubin wrote:
>
>> willie <[EMAIL PROTECTED]> writes:
>>> # U+270C
>>> # 11100010 10011100 10001100
>>> buf = "\xE2\x9C\x8C"
>>> u = buf.decode('UTF-8')
>>> # ... later ...
>>> u.bytes() -> 3
>>>
>>> (goes through each code point and
John Machin wrote:
>> > So all he needs is a boolean result: u.willitfit(encoding, width)
>>
>> at what point in the program would that method be used ?
>
> Never, I hope. Were you taking that as a serious suggestion? Fredrik,
> perhaps your irony detector needs a little preventative maintenance :-
Fredrik Lundh wrote:
> John Machin wrote:
>
> > Actually, what Willie was concerned about was some cockamamie DBMS
> > which required to be fed Unicode, which it encoded as UTF-8, but
> > silently truncated if it was more than the n in varchar(n) ... or
> > something like that.
> >
> > So all he n
willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
>
> # U+270C
> #
John Machin wrote:
> Actually, what Willie was concerned about was some cockamamie DBMS
> which required to be fed Unicode, which it encoded as UTF-8, but
> silently truncated if it was more than the n in varchar(n) ... or
> something like that.
>
> So all he needs is a boolean result: u.willitfit
Paul Rubin wrote:
> Leif K-Brooks <[EMAIL PROTECTED]> writes:
> > It requires a fairly large change to code and API for a relatively
> > uncommon problem. How often do you need to know how many bytes an
> > encoded Unicode string takes up without needing the encoded string
> > itself?
>
> Shrug. I
On Mon, 25 Sep 2006 00:45:29 -0700, Paul Rubin wrote:
> willie <[EMAIL PROTECTED]> writes:
>> # U+270C
>> # 11100010 10011100 10001100
>> buf = "\xE2\x9C\x8C"
>> u = buf.decode('UTF-8')
>> # ... later ...
>> u.bytes() -> 3
>>
>> (goes through each code point and calculates
>> the number of bytes
Paul Rubin wrote:
> "John Machin" <[EMAIL PROTECTED]> writes:
> > Actually, what Willie was concerned about was some cockamamie DBMS
> > which required to be fed Unicode, which it encoded as UTF-8,
>
> Yeah, I remember that.
>
> > Tell you what, why don't you and Willie get together and write a PE
"John Machin" <[EMAIL PROTECTED]> writes:
> Actually, what Willie was concerned about was some cockamamie DBMS
> which required to be fed Unicode, which it encoded as UTF-8,
Yeah, I remember that.
> Tell you what, why don't you and Willie get together and write a PEP?
If enough people care about
willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
Where it's been is irrelevant. Where it's going to is what matters.
> So that it's feasible to calculate the numb
Leif K-Brooks <[EMAIL PROTECTED]> writes:
> It requires a fairly large change to code and API for a relatively
> uncommon problem. How often do you need to know how many bytes an
> encoded Unicode string takes up without needing the encoded string
> itself?
Shrug. I don't see a real large change--
Paul Rubin wrote:
> Duncan Booth explains why that doesn't work. But I don't see any big
> problem with a byte count function that lets you specify an encoding:
>
> u = buf.decode('UTF-8')
> # ... later ...
> u.bytes('UTF-8') -> 3
> u.bytes('UCS-4') -> 4
>
> That avoids creat
willie <[EMAIL PROTECTED]> writes:
> # U+270C
> # 11100010 10011100 10001100
> buf = "\xE2\x9C\x8C"
> u = buf.decode('UTF-8')
> # ... later ...
> u.bytes() -> 3
>
> (goes through each code point and calculates
> the number of bytes that make up the character
> according to the encoding)
Duncan Bo
willie <[EMAIL PROTECTED]> wrote:
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
So what sort of output
willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
Yes. The unicode object itself is precisely the wrong place for that kind of
information. Many (most?) unicode o
16 matches
Mail list logo