Ross Ridge <[EMAIL PROTECTED]> writes:
> The Unicode standard doesn't require that you support surrogates,
> or any other kind of character, so no you wouldn't be lying.
+1 on Ross Ridge's contributions to this thread.
If Unicode is processed using UTF-8 or UTF-32 encoding forms then
there are
Ross Ridge writes:
> The Unicode standard doesn't require that you support surrogates, or
> any other kind of character, so no you wouldn't be lying.
<[EMAIL PROTECTED]> wrote:
> There is the notion of Unicode implementation levels, and each of them
> does include a set of characters to support.
> The Unicode standard doesn't require that you support surrogates, or
> any other kind of character, so no you wouldn't be lying.
There is the notion of Unicode implementation levels, and each of them
does include a set of characters to support. In level 1, combining
characters need not to be sup
> IMHO what is really needed is a bunch of high level methods like
> .graphemes() - iterate over graphemes
> .codepoints() - iterate over codepoints
> .isword() - check if the string represents one word
> etc...
This doesn't need to come as methods, though. If anybody wants to
provide a library wi
Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>I wish to write software that supports Unicode. Like it or not,
>Unicode goes beyond the BMP, so I'd be lying if I said I supported
>Unicode if I only handled the BMP.
The Unicode standard doesn't require that you support surrogates, or
any other kind of
On Apr 20, 7:34 pm, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > If you absolutely think support for non-BMP characters is necessary
> > in every program, suggesting that Python use UCS-4 by default on
> > all systems has a higher c
On Apr 20, 7:34 pm, Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
>
> > > I don't believe this specific variant has been discussed.
> > Now that you clarify it: no, it hasn't been discussed. I find that
> > not surprising - this proposal
Paul Boddie:
> Do we have a volunteer? ;-)
I won't volunteer to do a real implementation - the Unicode type in
Python is currently around 7000 lines long and there is other code to
change in, for example, regular expressions. Here's a demonstration C++
implementation that stores an array o
On Apr 20, 6:21 pm, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I don't believe this specific variant has been discussed.
>
> Now that you clarify it: no, it hasn't been discussed. I find that
> not surprising - this proposal is so strange and unnatural that
> probably nobody dared to suggest
On Apr 20, 5:49 pm, Ross Ridge <[EMAIL PROTECTED]>
wrote:
> Rhamphoryncus <[EMAIL PROTECTED]> wrote:
> >The only code that will be changed is that which doesn't handle
> >surrogates properly. Some will start working properly. Some (ie
> >random.choice(u'\U0010\u')) will fail explicitly (
> I don't believe this specific variant has been discussed.
Now that you clarify it: no, it hasn't been discussed. I find that
not surprising - this proposal is so strange and unnatural that
probably nobody dared to suggest it.
> s[5] does not exist. You would get an IndexError indicating that i
Rhamphoryncus <[EMAIL PROTECTED]> wrote:
>The only code that will be changed is that which doesn't handle
>surrogates properly. Some will start working properly. Some (ie
>random.choice(u'\U0010\u')) will fail explicitly (rather than
>silently).
You're falsely assuming that any code tha
On 20 Apr, 07:02, Neil Hodgson <[EMAIL PROTECTED]> wrote:
> Adam Olsen:
>
> > To solve this I propose Python's unicode type using UTF-16 should have
> > gaps in its index, allowing it to only expose complete unicode scalar
> > values. Iteration would produce surrogate pairs rather than
> > individ
(Sorry for the dupe, Martin. Gmail made it look like your reply was
in private.)
On 4/19/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > Thoughts, from all you readers out there? For/against?
>
> See PEP 261. This things have all been discussed at that time,
> and an explicit decision again
On Apr 19, 11:02 pm, Neil Hodgson <[EMAIL PROTECTED]>
wrote:
> Adam Olsen:
>
> > To solve this I propose Python's unicode type using UTF-16 should have
> > gaps in its index, allowing it to only expose complete unicode scalar
> > values. Iteration would produce surrogate pairs rather than
> > indi
> Thoughts, from all you readers out there? For/against?
See PEP 261. This things have all been discussed at that time,
and an explicit decision against what I think (*) your proposal is
was taken. If you want to, you can try to revert that
decision, but you would need to write a PEP.
Regards,
Adam Olsen:
> To solve this I propose Python's unicode type using UTF-16 should have
> gaps in its index, allowing it to only expose complete unicode scalar
> values. Iteration would produce surrogate pairs rather than
> individual surrogates, indexing to the first half of a surrogate pair
> woul
17 matches
Mail list logo