Hi. Ive got a "basic" situation that should be simpl. So it must be a user (me) issue!
I've got a page from a web fetch. I'm simply trying to go from utf-8 to ascii. I'm not worried about any cruft that might get stripped out as the data is generated from a us site. (It's a college/class dataset). I know this is a unicode issue. I know I need to have a much more robust/ythnic/correct approach. I will later, but for now, just want to resolve this issue, and get it off my plate so to speak. I've looked at stackoverflow, as well as numerous other sites, so I turn to the group for a pointer or two... The unicode that I'm dealing with is 'u\2013' The basic things I've done up to now are: s=content s=ascii_strip(s) s=s.replace('\u2013', '-') s=s.replace(u'\u2013', '-') s=s.replace(u"\u2013", "-") s=re.sub(u"\u2013", "-", s) print repr(s) When I look at the input content, I have : u'English 120 Course Syllabus \u2013 Fall \u2013 2006' So, any pointers on replacing the \u2013 with a simple '-' (dash) (or I could even handle just a ' ' (space) thanks _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor