Jeff McNeil <j...@jmcneil.net> wrote: > Is the string in your text file literally "\xea\xe0\xea+\xef\xee > \xe7\xe2\xee\xed\xe8\xf2\xfc" as "plain text?" My assumption is that > when you're reading that in, Python is interpreting each byte as an > ASCII value (and rightfully so) rather than the corresponding '\x' > escapes. > > As an experiment: > > (t)j...@marvin:~/t$ cat test.py > import chardet > > s = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc" > with open('test.txt', 'w') as f: > print >>f, s > > print chardet.detect(open('test.txt').read()) > (t)j...@marvin:~/t$ python test.py > {'confidence': 0.98999999999999999, 'encoding': 'windows-1251'} > (t)j...@marvin:~/t$ > > HTH, > > Jeff > mcjeff.blogspot.com
Thank you for your reply. You are right, Python reads data form the file in bytes and all data in this case is ASCII I solved the problem, just added line = line.decode('string_escape') f = open ("aword.txt", "r") for line in f: line = line.decode('string_escape') print chardet.detect(line) b = line.decode('cp1251') print b -- Only one 0_o -- http://mail.python.org/mailman/listinfo/python-list