On Mon, Sep 26, 2016 at 12:59:04PM -0400, bruce wrote: > When I look at the input content, I have : > > u'English 120 Course Syllabus \u2013 Fall \u2013 2006' > > So, any pointers on replacing the \u2013 with a simple '-' (dash) (or I > could even handle just a ' ' (space)
You misinterpret what you see. \u2013 *is* a dash (its an en-dash): py> import unicodedata py> unicodedata.name(u'\u2013') 'EN DASH' Try printing the string, and you will see what it looks like: py> content = u'English 120 Course Syllabus \u2013 Fall \u2013 2006' py> print content English 120 Course Syllabus – Fall – 2006 Python strings include a lot of escape codes. Simple byte strings include: \t tab \n newline \r carriage return \0 ASCII null byte etc. plus escape codes for hex codes: \xDD (two digit hex code, between hex 00 and hex FF) That lets you enter any byte between (decimal) 0 and 255. For example: \x20 is the hex code 20 (decimal 32), which is a space: py> '\x20' == ' ' True Unicode strings allow the same escape codes as byte strings, plus special Unicode escape codes: \uDDDD (four digit hex codes, for codes between 0 and 65535) \UDDDDDDDD (eight digit hex codes, for codes between 0 and 1114111) \N{name} (Unicode names) Remember to print the string to see what it looks like with the escape codes shown as actual characters, instead of escape codes. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor