2010/12/19 Martin Hvidberg <mar...@hvidberg.net>: > Dear list > > I have to read some data from an ASCII text file, filter it, and then export > it to a .dbf file. Basically a straight forward task... > My problem is that the input files contains some special national (Danish) > characters, and it appears that I have to do something special to handle > these in Python. > The Danish language contains three letters not in the English alphabet: æ, ø > and å. > E.g. the Danish city name 'SOLRØD' is red by Python as 'SOLR\xc3\x98D' > The three letters, in lower and upper case, seems to get translated as > follow: > > æ = \xc3\xa6 > ø = \xc3\xb8 > å = \xc3\xa5 > Æ = \xc3\x86 > Ø = \xc3\x98 > Å = \xc3\x85 > > Question: > What is this, how do I get my Danish letters back? > > Best Regards > Martin > > -- > http://mail.python.org/mailman/listinfo/python-list > > Hi, it seems, your data is utf-8 encoded text cf. >>> u"æ".encode("utf-8") '\xc3\xa6' >>> u"ø".encode("utf-8") '\xc3\xb8' you can decode the file content using this encoding if it is already read somewhere or use codecs.open with the same encoding >>> print '\xc3\xa6'.decode("utf-8") æ >>>
hth, vbr -- http://mail.python.org/mailman/listinfo/python-list