Re: Changing filenames from Greeklish => Greek (subprocess complain)

Andreas Perstinger Mon, 10 Jun 2013 03:46:26 -0700

On 10.06.2013 11:59, Νικόλαος Κούρας wrote:

>>>> s = 'α'
>>>> s.encode('utf-8')
> b'\xce\xb1'


'b' stands for binary right?


No, here it stands for bytes:
http://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals

  b'\xce\xb1' = we are looking at a byte in a hexadecimal format?


No, b'\xce\xb1' represents a byte object containing 2 bytes.
Yes, each byte is represented in hexadecimal format.

if yes how could we see it in binary and decimal represenation?


>>> s = b'\xce\xb1'
>>> s[0]
206
>>> bin(s[0])
'0b11001110'
>>> s[1]
177
>>> bin(s[1])
'0b10110001'

A byte object is a sequence of bytes (= integer values) and supportindexing.

http://docs.python.org/3/library/stdtypes.html#bytes

Since 2^8 = 256, utf-8 should store the first 256 chars of unicode
charset using 1 byte.

Also Since 2^16 = 65535, utf-8 should store the first 65535 chars of
unicode charset using 2 bytes and so on.

But i know that this is not the case. But i dont understand why.


Because your method doesn't work.

If you use all possible 256 bit-combinations to represent a validcharacter, how do you decide where to stop in a sequence of bytes?

>>>> s = 'a'
>>>> s.encode('utf-8')
> b'a'
utf-8 takes ASCII as it is, as 1 byte. They are the same


EBCDIC and ASCII and Unicode are charactet sets, correct?

iso-8859-1, iso-8859-7, utf-8, utf-16, utf-32 and so on are encoding methods, 
right?

Look at http://www.unicode.org/glossary/ for an explanation of all theterms.


Bye, Andreas
--
http://mail.python.org/mailman/listinfo/python-list

Re: Changing filenames from Greeklish => Greek (subprocess complain)

Reply via email to