Re: unicode direction control characters

Robin Becker Tue, 02 Jan 2018 07:39:55 -0800

On 02/01/2018 15:18, Chris Angelico wrote:

On Wed, Jan 3, 2018 at 1:30 AM, Robin Becker <[email protected]> wrote:

I'm seeing some strange characters in web responses eg


u'\u200e28\u200e/\u200e09\u200e/\u200e1962'

for a date of birth. The code \u200e is LEFT-TO-RIGHT MARK according to
unicodedata.name.  I tried unicodedata.normalize, but it leaves those
characters there. Is there any standard way to deal with these?

I assume that some browser+settings combination is putting these in eg
perhaps the language is normally right to left but numbers are not.


Unicode normalization is a different beast altogether. You could
probably just remove the LTR marks and run with the rest, though, as
they don't seem to be important in this string.

ChrisA

I guess I'm really wondering whether the BIDI control characters have anysemantic meaning. Most numbers seem to be LTR.

If I saw u'\u200f12' it seems to imply that the characters should be displayed'21', but I don't know whether the number is 12 or 21.

--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Re: unicode direction control characters

Reply via email to