Ross Ridge <[EMAIL PROTECTED]> writes:
> The Unicode standard doesn't require that you support surrogates,
> or any other kind of character, so no you wouldn't be lying.
+1 on Ross Ridge's contributions to this thread.
If Unicode is processed using UTF-8 or UTF-32 encoding forms then
there are
Ross Ridge writes:
> The Unicode standard doesn't require that you support surrogates, or
> any other kind of character, so no you wouldn't be lying.
<[EMAIL PROTECTED]> wrote:
> There is the notion of Unicode implementation levels, and each of them
> does include a set of characters to support.
> The Unicode standard doesn't require that you support surrogates, or
> any other kind of character, so no you wouldn't be lying.
There is the notion of Unicode implementation levels, and each of them
does include a set of characters to support. In level 1, combining
characters need not to be sup
> IMHO what is really needed is a bunch of high level methods like
> .graphemes() - iterate over graphemes
> .codepoints() - iterate over codepoints
> .isword() - check if the string represents one word
> etc...
This doesn't need to come as methods, though. If anybody wants to
provide a library wi
Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>I wish to write software that supports Unicode. Like it or not,
>Unicode goes beyond the BMP, so I'd be lying if I said I supported
>Unicode if I only handled the BMP.
The Unicode standard doesn't require that you support surrogates, or
any other kind of
On Apr 20, 7:34 pm, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > If you absolutely think support for non-BMP characters is necessary
> > in every program, suggesting that Python use UCS-4 by default on
> > all systems has a higher c
On Apr 20, 7:34 pm, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>
> > > I don't believe this specific variant has been discussed.
> > Now that you clarify it: no, it hasn't been discussed. I find that
> > not surprising - this proposal
Paul Boddie:
> Do we have a volunteer? ;-)
I won't volunteer to do a real implementation - the Unicode type in
Python is currently around 7000 lines long and there is other code to
change in, for example, regular expressions. Here's a demonstration C++
implementation that stores an array o
On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I don't believe this specific variant has been discussed.
>
> Now that you clarify it: no, it hasn't been discussed. I find that
> not surprising - this proposal is so strange and unnatural that
> probably nobody dared to suggest
On Apr 20, 5:49 pm, Ross Ridge <[EMAIL PROTECTED]>
wrote:
> Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> >The only code that will be changed is that which doesn't handle
> >surrogates properly. Some will start working properly. Some (ie
> >random.choice(u'\U0010\u')) will fail explicitly (
> I don't believe this specific variant has been discussed.
Now that you clarify it: no, it hasn't been discussed. I find that
not surprising - this proposal is so strange and unnatural that
probably nobody dared to suggest it.
> s[5] does not exist. You would get an IndexError indicating that i
Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>The only code that will be changed is that which doesn't handle
>surrogates properly. Some will start working properly. Some (ie
>random.choice(u'\U0010\u')) will fail explicitly (rather than
>silently).
You're falsely assuming that any code tha
On 20 Apr, 07:02, Neil Hodgson <[EMAIL PROTECTED]> wrote:
> Adam Olsen:
>
> > To solve this I propose Python's unicode type using UTF-16 should have
> > gaps in its index, allowing it to only expose complete unicode scalar
> > values. Iteration would produce surrogate pairs rather than
> > individ
(Sorry for the dupe, Martin. Gmail made it look like your reply was
in private.)
On 4/19/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > Thoughts, from all you readers out there? For/against?
>
> See PEP 261. This things have all been discussed at that time,
> and an explicit decision again
On Apr 19, 11:02 pm, Neil Hodgson <[EMAIL PROTECTED]>
wrote:
> Adam Olsen:
>
> > To solve this I propose Python's unicode type using UTF-16 should have
> > gaps in its index, allowing it to only expose complete unicode scalar
> > values. Iteration would produce surrogate pairs rather than
> > indi
> Thoughts, from all you readers out there? For/against?
See PEP 261. This things have all been discussed at that time,
and an explicit decision against what I think (*) your proposal is
was taken. If you want to, you can try to revert that
decision, but you would need to write a PEP.
Regards,
Adam Olsen:
> To solve this I propose Python's unicode type using UTF-16 should have
> gaps in its index, allowing it to only expose complete unicode scalar
> values. Iteration would produce surrogate pairs rather than
> individual surrogates, indexing to the first half of a surrogate pair
> woul
As was seen in another thread[1], there's a great deal of confusion
with regard to surrogates. Most programmers assume Python's unicode
type exposes only complete characters. Even CPython's own functions
do this on occasion. This leads to different behaviour across
platforms and makes it unneces
18 matches
Mail list logo