Fredrik Lundh wrote: > > http://www.google.com/search?q=python+unicode > > (and before anyone starts screaming about how they hate RTFM replies, look > at the search result) > > </F> Thanks!! but i have already tried this... and let me tell you what i am trying now...
I have added the following line in the script # -*- coding: utf-8 -*- I have also modified the site.py in ./Python24/Lib as def setencoding(): """Set the string encoding used by the Unicode implementation. The default is 'ascii', but if you're willing to experiment, you can change this.""" encoding = "utf-8" # Default value set by _PyUnicode_Init() if 0: # Enable to support locale aware default string encodings. import locale loc = locale.getdefaultlocale() if loc[1]: encoding = loc[1] if 0: # Enable to switch off string to Unicode coercion and implicit # Unicode to string conversion. encoding = "undefined" if encoding != "ascii": # On Non-Unicode builds this will raise an AttributeError... sys.setdefaultencoding(encoding) # Needs Python Unicode build ! Now when I try to validate the data in the text file say abc.txt (saved as with utf-8 encoding) containing either english or russian text, some junk character (box like) is added as the first character what must be the reason for this? and how do I handle it? -- http://mail.python.org/mailman/listinfo/python-list