On Jun 11, 4:24 pm, Sydoruk Yaroslav <sw...@mirohost.net> wrote: > Hello all, > > In a text file aword.txt, there is a string: > "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc". > > There is a first script: > f = open ("aword.txt", "r") > for line in f: > print chardet.detect(line) > b = line.decode('cp1251') > print b > > _RESULT_ > {'confidence': 1.0, 'encoding': 'ascii'} > \xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc > > There is a second script: > line = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc" > print chardet.detect(line) > b = line.decode('cp1251') > print b > > _RESULT_ > {'confidence': 0.98999999999999999, 'encoding': 'windows-1251'} > как+позвонить > > Why is reading from a file into a string variable is defined as ascii, > but when it is clearly defined in the script is defined as cp1251. > How do I solve this problem. > > -- > Only one 0_o
Is the string in your text file literally "\xea\xe0\xea+\xef\xee \xe7\xe2\xee\xed\xe8\xf2\xfc" as "plain text?" My assumption is that when you're reading that in, Python is interpreting each byte as an ASCII value (and rightfully so) rather than the corresponding '\x' escapes. As an experiment: (t)j...@marvin:~/t$ cat test.py import chardet s = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc" with open('test.txt', 'w') as f: print >>f, s print chardet.detect(open('test.txt').read()) (t)j...@marvin:~/t$ python test.py {'confidence': 0.98999999999999999, 'encoding': 'windows-1251'} (t)j...@marvin:~/t$ HTH, Jeff mcjeff.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list