On 2015-05-03, MRAB wrote:
> There's also a mistake in this bit:
>
> """
> # Note that according to the \u escaping convention, a supplemental
> character (> 0x10) is represented
> # by a sequence of two surrogate characters: the first between D800 and
> DBFF, and the second between DC00
On 2015-05-03 17:26, Jon Ribbens wrote:
On 2015-05-03, MRAB wrote:
On 2015-05-03 16:32, Jon Ribbens wrote:
That would, unfortunately, be "tell the Unicode Consortium to format
their documents differently", which seems unlikely to happen. I'm
trying to read in: http://www.unicode.org/Public/idn
On Mon, May 4, 2015 at 2:30 AM, Jon Ribbens
wrote:
> I did some experimentation, and it looks like the answer is:
>
> "\udb40\udd9d".encode("utf16", "surrogatepass").decode("utf16")
>
> Thanks for your help!
Ha! That's the one. I went poking around but couldn't find the name
for it. That's exac
On 2015-05-03, Chris Angelico wrote:
> On Mon, May 4, 2015 at 1:32 AM, Jon Ribbens
> wrote:
>> That would, unfortunately, be "tell the Unicode Consortium to format
>> their documents differently", which seems unlikely to happen. I'm
>> trying to read in: http://www.unicode.org/Public/idna/6.3.0/Id
On 2015-05-03, MRAB wrote:
> On 2015-05-03 16:32, Jon Ribbens wrote:
>> That would, unfortunately, be "tell the Unicode Consortium to format
>> their documents differently", which seems unlikely to happen. I'm
>> trying to read in: http://www.unicode.org/Public/idna/6.3.0/IdnaTest.txt
>>
> That do
On 2015-05-03 16:32, Jon Ribbens wrote:
On 2015-05-03, Chris Angelico wrote:
On Mon, May 4, 2015 at 12:40 AM, Jon Ribbens
wrote:
If I have a string containing surrogate pairs like this in Python 3.4:
"\udb40\udd9d"
How do I convert it into the proper form:
"\U000E019D"
? The answer ap
On Mon, May 4, 2015 at 1:32 AM, Jon Ribbens
wrote:
>> You shouldn't even actually _have_ those in your string in the first
>> place. How did you construct/receive that data? Ideally, catch it at
>> that point, and deal with it there.
>
> That would, unfortunately, be "tell the Unicode Consortium t
Jon Ribbens :
> Python doesn't appear to have UCS-2 support, so I guess what you're
> saying is that I have to write my own surrogate-decoder? This seems a
> little surprising.
Try UTF-16.
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On 2015-05-03, Chris Angelico wrote:
> On Mon, May 4, 2015 at 12:40 AM, Jon Ribbens
> wrote:
>> If I have a string containing surrogate pairs like this in Python 3.4:
>>
>> "\udb40\udd9d"
>>
>> How do I convert it into the proper form:
>>
>> "\U000E019D"
>>
>> ? The answer appears not to be "u
On Mon, May 4, 2015 at 12:40 AM, Jon Ribbens
wrote:
> If I have a string containing surrogate pairs like this in Python 3.4:
>
> "\udb40\udd9d"
>
> How do I convert it into the proper form:
>
> "\U000E019D"
>
> ? The answer appears not to be "unicodedata.normalize".
No, it's not, because Unic
10 matches
Mail list logo