On Wed, Jan 3, 2018 at 1:30 AM, Robin Becker <ro...@reportlab.com> wrote:
> I'm seeing some strange characters in web responses eg
>
> u'\u200e28\u200e/\u200e09\u200e/\u200e1962'
>
> for a date of birth. The code \u200e is LEFT-TO-RIGHT MARK according to
> unicodedata.name.  I tried unicodedata.normalize, but it leaves those
> characters there. Is there any standard way to deal with these?
>
> I assume that some browser+settings combination is putting these in eg
> perhaps the language is normally right to left but numbers are not.

Unicode normalization is a different beast altogether. You could
probably just remove the LTR marks and run with the rest, though, as
they don't seem to be important in this string.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to