On Wed, Jan 3, 2018 at 1:30 AM, Robin Becker <ro...@reportlab.com> wrote: > I'm seeing some strange characters in web responses eg > > u'\u200e28\u200e/\u200e09\u200e/\u200e1962' > > for a date of birth. The code \u200e is LEFT-TO-RIGHT MARK according to > unicodedata.name. I tried unicodedata.normalize, but it leaves those > characters there. Is there any standard way to deal with these? > > I assume that some browser+settings combination is putting these in eg > perhaps the language is normally right to left but numbers are not.
Unicode normalization is a different beast altogether. You could probably just remove the LTR marks and run with the rest, though, as they don't seem to be important in this string. ChrisA -- https://mail.python.org/mailman/listinfo/python-list