reading from file

2009-06-11 Thread Sydoruk Yaroslav
Hello all,

In a text file aword.txt, there is a string:
"\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc".

There is a first script:
f = open ("aword.txt", "r")
for line in f:
print chardet.detect(line)
b = line.decode('cp1251')
print b

_RESULT_
{'confidence': 1.0, 'encoding': 'ascii'}
\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc

There is a second script:
line = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc"
print chardet.detect(line)
b = line.decode('cp1251')
print b

_RESULT_
{'confidence': 0.98999, 'encoding': 'windows-1251'}
как+позвонить

Why is reading from a file into a string variable is defined as ascii, 
but when it is clearly defined in the script is defined as cp1251. 
How do I solve this problem.


-- 
Only one 0_o
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: reading from file

2009-06-11 Thread Sydoruk Yaroslav
Jeff McNeil  wrote:
> Is the string in your text file literally "\xea\xe0\xea+\xef\xee
> \xe7\xe2\xee\xed\xe8\xf2\xfc" as "plain text?"  My assumption is that
> when you're reading that in, Python is interpreting each byte as an
> ASCII value (and rightfully so) rather than the corresponding '\x'
> escapes.
> 
> As an experiment:
> 
> (t)j...@marvin:~/t$ cat test.py
> import chardet
> 
> s = "\xea\xe0\xea+\xef\xee\xe7\xe2\xee\xed\xe8\xf2\xfc"
> with open('test.txt', 'w') as f:
>print >>f, s
> 
> print chardet.detect(open('test.txt').read())
> (t)j...@marvin:~/t$ python test.py
> {'confidence': 0.98999, 'encoding': 'windows-1251'}
> (t)j...@marvin:~/t$
> 
> HTH,
> 
> Jeff
> mcjeff.blogspot.com


Thank you for your reply.
You are right, Python reads data form the file in bytes and all data in this 
case is ASCII


I solved the problem, just added line = line.decode('string_escape')

f = open ("aword.txt", "r")
for line in f:
 line = line.decode('string_escape')
     print chardet.detect(line)
     b = line.decode('cp1251')
     print b
-- 
Only one 0_o
-- 
http://mail.python.org/mailman/listinfo/python-list