On Fri, Dec 07, 2018 at 02:06:16PM +0530, Sunil Tech wrote: > Hi Alan, > > I am using Python 2.7.8
That is important information. Python 2 unfortunately predates Unicode, and when it was added some bad decisions were made. For example, we can write this in Python 2: >>> txt = "abcπ" but it is a lie, because what we get isn't the string we typed, but the interpreters *bad guess* that we actually meant this: >>> txt 'abc\xcf\x80' Depending on your operating system, sometimes you can work with these not-really-text strings for a long time, but when it fails, it fails HARD with confusing errors. Just as you have here: > >>> tx = "MOUNTAIN VIEW WOMEN’S HEALTH CLINIC" > >>> tx.decode() > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 19: > ordinal not in range(128) Here, Python tried to guess an encoding, and picked some platform-specific encoding like Latin-1 or CP-1252 or something even more exotic. That is the wrong thing to do. But if you can guess which encoding it uses, you can make it work: tx.decode("Latin1") tx.decode("CP-1252") But a better fix is to use actual text, by putting a "u" prefix outside the quote marks: txt = u"MOUNTAIN VIEW WOMEN’S HEALTH CLINIC" If you need to write this to a file, you can do this: file.write(txt.encode('utf-8')) To read it back again: # from a file using UTF-8 txt = file.read().decode('utf-8') (If you get a decoding error, it means your text file wasn't actually UTF-8. Ask the supplier what it really is.) > How to know whether in a given string(sentence) is there any that is not > ASCII character and how to replace? That's usually the wrong solution. That's like saying, "My program can't add numbers greater than 100. How do I tell if a number is greater than 100, and turn it into a number smaller than 100?" You can do this: mystring = "something" if any(ord(c) > 127 for c in mystring): print "Contains non-ASCII" But what you do then is hard to decide. Delete non-ASCII characters? Replace them with what? If you are desperate, you can do this: bytestring = "something" text = bytestring.decode('ascii', errors='replace') bytestring = text.encode('ascii', errors='replace') but that will replace any non-ascii character with a question mark "?" which might not be what you want. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor