Hi I've read around quite a bit about Unicode and python's support for it, and I'm still unclear about how it all fits together in certain scenarios. Can anyone help clarify?
* When I say "# -*- coding: utf-8 -*-" and confirm my IDE is saving the source file as UTF-8, do I still need to prefix all the strings constructed in the source with u as in myStr = u"blah", even when those strings contain only ASCII or ISO-8859-1 chars? (It would be a bother for me to do this for the complete source I'm working on, where I rarely need chars outside the ISO-8859-1 range.) * Will python figure it out if I use different encodings in different modules -- say a main source file which is "# -*- coding: utf-8 -*-" and an imported module which doesn't say this (for which python will presumably use a default encoding)? This seems inevitable given that standard library modules such as re don't declare an encoding, presumably because in that case I don't see any non-ASCII chars in the source. * If I want to use a Unicode char in a regex -- say an en-dash, U+2013 -- in an ASCII- or ISO-8859-1-encoded source file, can I say myASCIIRegex = re.compile('[A-Z]') myUniRegex = re.compile(u'\u2013') # en-dash then read the source file into a unicode string with codecs.read(), then expect re to match against the unicode string using either of those regexes if the string contains the relevant chars? Or do I need to do make all my regex patterns unicode strings, with u""? I've been trying to understand this for a while so any clarification would be a great help. Tim -- http://mail.python.org/mailman/listinfo/python-list