[Tutor] unicode decode/encode issue

bruce Mon, 26 Sep 2016 10:02:06 -0700

Hi.

Ive got a "basic" situation that should be simpl. So it must be a user (me)
issue!



I've got a page from a web fetch. I'm simply trying to go from utf-8 to
ascii. I'm not worried about any cruft that might get stripped out as the
data is generated from a us site. (It's a college/class dataset).

I know this is a unicode issue. I know I need to have a much more
robust/ythnic/correct approach. I will later, but for now, just want to
resolve this issue, and get it off my plate so to speak.

I've looked at stackoverflow, as well as numerous other sites, so I turn to
the group for a pointer or two...

The unicode that I'm dealing with is 'u\2013'

The basic things I've done up to now are:

  s=content
  s=ascii_strip(s)
  s=s.replace('\u2013', '-')
  s=s.replace(u'\u2013', '-')
  s=s.replace(u"\u2013", "-")
  s=re.sub(u"\u2013", "-", s)
  print repr(s)

When I look at the input content, I have :

 u'English 120 Course Syllabus \u2013 Fall \u2013 2006'

So, any pointers on replacing the \u2013 with a simple '-' (dash) (or I
could even handle just a ' ' (space)

thanks
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] unicode decode/encode issue

Reply via email to