On Fri, Dec 07, 2018 at 01:28:18PM +0530, Sunil Tech wrote: > Hi Tutor, > > I have a trouble with dealing with special characters in Python
There are no special characters in Python. There are only Unicode characters. All characters are Unicode, including those which are also ASCII. Start here: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ https://blog.codinghorror.com/there-aint-no-such-thing-as-plain-text/ https://www.youtube.com/watch?v=sgHbC6udIqc https://nedbatchelder.com/text/unipain.html https://docs.python.org/3/howto/unicode.html https://docs.python.org/2/howto/unicode.html Its less than a month away from 2019. It is sad and shameful to be forced to use only ASCII, and nearly always unnecessary. Writing code that only supports the 128 ASCII characters is like writing a calculator that only supports numbers from 1 to 10. But if you really must do so, keep reading. > Below is > the sentence with a special character(apostrophe) "MOUNTAIN VIEW WOMEN’S > HEALTH CLINIC" with actually should be "MOUNTAIN VIEW WOMEN'S HEALTH CLINIC > ". Actually, no, it should be precisely what it is: "WOMEN’S" is correct, since that is an apostrophe. Changing the ’ to an inch-mark ' is not correct. But if you absolutely MUST change it: mystring = "MOUNTAIN VIEW WOMEN’S HEALTH CLINIC" mystring = mystring.replace("’", "'") will do it in Python 3. In Python 2 you have to write this instead: # Python 2 only mystring = u"MOUNTAIN VIEW WOMEN’S HEALTH CLINIC" mystring = mystring.replace(u"’", u"'") to ensure Python uses Unicode strings. What version of Python are you using, and what are you doing that gives you trouble? It is very unlikely that the only way to solve the problem is to throw away the precise meaning of the text you are dealing with by reducing it to ASCII. In Python 3, you can also do this: mystring = ascii(mystring) but the result will probably not be what you want. > Please help, how to identify these kinds of special characters and replace > them with appropriate ASCII? For 99.99% of the characters, there is NO appropriate ASCII. What ASCII character do you expect for these? § π Й খ ₪ ∀ ▶ 丕 ☃ ☺️ ASCII, even when it was invented in 1963, wasn't sufficient even for American English (no cent sign, no proper quotes, missing punctuation marks) let alone British English or international text. Unless you are stuck communicating with an ancient program written in the 1970s or 80s that cannot be upgraded, there are few good reasons to cripple your program by only supporting ASCII text. But if you really need to, this might help: http://code.activestate.com/recipes/251871-latin1-to-ascii-the-unicode-hammer/ http://code.activestate.com/recipes/578243-repair-common-unicode-mistakes-after-theyve-been-m/ -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor