Re: reading from file

Jeff McNeil Thu, 11 Jun 2009 13:51:23 -0700

On Jun 11, 4:24 pm, Sydoruk Yaroslav <[email protected]> wrote:
> Hello all,
>
> In a text file aword.txt, there is a string:
>     "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc".
>
> There is a first script:
> f = open ("aword.txt", "r")
> for line in f:
>     print chardet.detect(line)
>     b = line.decode('cp1251')
>     print b
>
> _RESULT_
> {'confidence': 1.0, 'encoding': 'ascii'}
> \xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc
>
> There is a second script:
> line = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc"
> print chardet.detect(line)
> b = line.decode('cp1251')
> print b
>
> _RESULT_
> {'confidence': 0.98999999999999999, 'encoding': 'windows-1251'}
> как+позвонить
>
> Why is reading from a file into a string variable is defined as ascii,
> but when it is clearly defined in the script is defined as cp1251.
> How do I solve this problem.
>
> --
> Only one 0_o


Is the string in your text file literally "\xea\xe0\xea+\xef\xee
\xe7\xe2\xee\xed\xe8\xf2\xfc" as "plain text?"  My assumption is that
when you're reading that in, Python is interpreting each byte as an
ASCII value (and rightfully so) rather than the corresponding '\x'
escapes.

As an experiment:

(t)j...@marvin:~/t$ cat test.py
import chardet

s = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc"
with open('test.txt', 'w') as f:
        print >>f, s

print chardet.detect(open('test.txt').read())
(t)j...@marvin:~/t$ python test.py
{'confidence': 0.98999999999999999, 'encoding': 'windows-1251'}
(t)j...@marvin:~/t$

HTH,

Jeff
mcjeff.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: reading from file

Reply via email to