willie wrote:
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
>
> # U+270C
> # 11100010 10011100 10001100
Steven D'Aprano wrote:
> On Mon, 25 Sep 2006 00:45:29 -0700, Paul Rubin wrote:
>
>> willie <[EMAIL PROTECTED]> writes:
>>> # U+270C
>>> # 11100010 10011100 10001100
>>> buf = "\xE2\x9C\x8C"
>>> u = buf.decode('UTF-8')
>>> # ... later ...
>>> u.bytes() -> 3
>>>
>>> (goes through each code point and
John Machin wrote:
>> > So all he needs is a boolean result: u.willitfit(encoding, width)
>>
>> at what point in the program would that method be used ?
>
> Never, I hope. Were you taking that as a serious suggestion? Fredrik,
> perhaps your irony detector needs a little preventative maintenance :-
Fredrik Lundh wrote:
> John Machin wrote:
>
> > Actually, what Willie was concerned about was some cockamamie DBMS
> > which required to be fed Unicode, which it encoded as UTF-8, but
> > silently truncated if it was more than the n in varchar(n) ... or
> > something like that.
> >
> > So all he n
willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
>
> # U+270C
> #
John Machin wrote:
> Actually, what Willie was concerned about was some cockamamie DBMS
> which required to be fed Unicode, which it encoded as UTF-8, but
> silently truncated if it was more than the n in varchar(n) ... or
> something like that.
>
> So all he needs is a boolean result: u.willitfit
Paul Rubin wrote:
> Leif K-Brooks <[EMAIL PROTECTED]> writes:
> > It requires a fairly large change to code and API for a relatively
> > uncommon problem. How often do you need to know how many bytes an
> > encoded Unicode string takes up without needing the encoded string
> > itself?
>
> Shrug. I
On Mon, 25 Sep 2006 00:45:29 -0700, Paul Rubin wrote:
> willie <[EMAIL PROTECTED]> writes:
>> # U+270C
>> # 11100010 10011100 10001100
>> buf = "\xE2\x9C\x8C"
>> u = buf.decode('UTF-8')
>> # ... later ...
>> u.bytes() -> 3
>>
>> (goes through each code point and calculates
>> the number of bytes
Paul Rubin wrote:
> "John Machin" <[EMAIL PROTECTED]> writes:
> > Actually, what Willie was concerned about was some cockamamie DBMS
> > which required to be fed Unicode, which it encoded as UTF-8,
>
> Yeah, I remember that.
>
> > Tell you what, why don't you and Willie get together and write a PE
"John Machin" <[EMAIL PROTECTED]> writes:
> Actually, what Willie was concerned about was some cockamamie DBMS
> which required to be fed Unicode, which it encoded as UTF-8,
Yeah, I remember that.
> Tell you what, why don't you and Willie get together and write a PEP?
If enough people care about
willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
Where it's been is irrelevant. Where it's going to is what matters.
> So that it's feasible to calculate the numb
Leif K-Brooks <[EMAIL PROTECTED]> writes:
> It requires a fairly large change to code and API for a relatively
> uncommon problem. How often do you need to know how many bytes an
> encoded Unicode string takes up without needing the encoded string
> itself?
Shrug. I don't see a real large change--
Paul Rubin wrote:
> Duncan Booth explains why that doesn't work. But I don't see any big
> problem with a byte count function that lets you specify an encoding:
>
> u = buf.decode('UTF-8')
> # ... later ...
> u.bytes('UTF-8') -> 3
> u.bytes('UCS-4') -> 4
>
> That avoids creat
willie <[EMAIL PROTECTED]> writes:
> # U+270C
> # 11100010 10011100 10001100
> buf = "\xE2\x9C\x8C"
> u = buf.decode('UTF-8')
> # ... later ...
> u.bytes() -> 3
>
> (goes through each code point and calculates
> the number of bytes that make up the character
> according to the encoding)
Duncan Bo
willie <[EMAIL PROTECTED]> wrote:
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
> So that it's feasible to calculate the number
> of bytes that make up the unicode code points.
So what sort of output
willie wrote:
> (beating a dead horse)
>
> Is it too ridiculous to suggest that it'd be nice
> if the unicode object were to remember the
> encoding of the string it was decoded from?
Yes. The unicode object itself is precisely the wrong place for that kind of
information. Many (most?) unicode o
(beating a dead horse)
Is it too ridiculous to suggest that it'd be nice
if the unicode object were to remember the
encoding of the string it was decoded from?
So that it's feasible to calculate the number
of bytes that make up the unicode code points.
# U+270C
# 11100010 10011100 10001100
buf =
17 matches
Mail list logo