On 2015-05-03, Chris Angelico <ros...@gmail.com> wrote: > On Mon, May 4, 2015 at 1:32 AM, Jon Ribbens ><jon+use...@unequivocal.co.uk> wrote: >> That would, unfortunately, be "tell the Unicode Consortium to format >> their documents differently", which seems unlikely to happen. I'm >> trying to read in: http://www.unicode.org/Public/idna/6.3.0/IdnaTest.txt > > Ah, so what you _actually_ have is "\\udb40\\udd9d" - the backslashes > are in your input.
Well, they were, but I already wrote code to convert them into the strings I showed in my original post. > I'm not sure what the best way to deal with that is... it's a bit of > a mess. You may find yourself needing to do something manually, > unless there's a way to ask Python to encode to pseudo-UCS-2 that > allows surrogates. Some languages may have sloppy conversions > available, but Python's seems to be quite strict (which is correct). > Is there an errors handler that can do this? I did some experimentation, and it looks like the answer is: "\udb40\udd9d".encode("utf16", "surrogatepass").decode("utf16") Thanks for your help! -- https://mail.python.org/mailman/listinfo/python-list