Re: A few questiosn about encoding

2013-06-25 Thread wxjmfauth
Le dimanche 23 juin 2013 18:30:40 UTC+2, Steven D'Aprano a écrit : > On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote: > > > > > utf-8: how many bytes to hold an "a" in memory? one byte. > > > > > > flexible string representation: how many bytes to hold an "a" in memory? > > > One byte? N

Re: A few questiosn about encoding

2013-06-23 Thread Steven D'Aprano
On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote: > utf-8: how many bytes to hold an "a" in memory? one byte. > > flexible string representation: how many bytes to hold an "a" in memory? > One byte? No, two. (Funny, it consumes more memory to hold an ascii char > than ascii itself) Incorrect.

Re: A few questiosn about encoding

2013-06-23 Thread wxjmfauth
Le jeudi 20 juin 2013 19:17:12 UTC+2, MRAB a écrit : > On 20/06/2013 17:37, Chris Angelico wrote: > > > On Fri, Jun 21, 2013 at 2:27 AM, wrote: > > >> And all these coding schemes have something in common, > > >> they work all with a unique set of code points, more > > >> precisely a unique s

Re: A few questiosn about encoding

2013-06-20 Thread Mark Lawrence
On 20/06/2013 17:27, wxjmfa...@gmail.com wrote: Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit : On 20/06/2013 07:26, Steven D'Aprano wrote: On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote: On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote: Gah! That's

Re: A few questiosn about encoding

2013-06-20 Thread Jussi Piitulainen
Rick Johnson writes: > On Thursday, June 20, 2013 9:04:50 AM UTC-5, Andrew Berg wrote: > > On 2013.06.20 08:40, Rick Johnson wrote: > > > > then what is the purpose of a Unicode Braille character set? > > Two dimensional characters can be made into 3 dimensional shapes. > > Yes in the real worl

Re: A few questiosn about encoding

2013-06-20 Thread Chris Angelico
On Fri, Jun 21, 2013 at 3:17 AM, MRAB wrote: > On 20/06/2013 17:37, Chris Angelico wrote: >> >> On Fri, Jun 21, 2013 at 2:27 AM, wrote: >>> >>> And all these coding schemes have something in common, >>> they work all with a unique set of code points, more >>> precisely a unique set of encoded co

Re: A few questiosn about encoding

2013-06-20 Thread MRAB
On 20/06/2013 17:37, Chris Angelico wrote: On Fri, Jun 21, 2013 at 2:27 AM, wrote: And all these coding schemes have something in common, they work all with a unique set of code points, more precisely a unique set of encoded code points (not the set of implemented code points (byte)). Just wh

Re: A few questiosn about encoding

2013-06-20 Thread Andreas Perstinger
Rick Johnson wrote: > > Since we're on the subject of Unicode: > >One the most humorous aspects of Unicode is that it has >encodings for Braille characters. Hmm, this presents a

Re: A few questiosn about encoding

2013-06-20 Thread Chris Angelico
On Fri, Jun 21, 2013 at 2:27 AM, wrote: > And all these coding schemes have something in common, > they work all with a unique set of code points, more > precisely a unique set of encoded code points (not > the set of implemented code points (byte)). > > Just what the flexible string representati

Re: A few questiosn about encoding

2013-06-20 Thread wxjmfauth
Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit : > On 20/06/2013 07:26, Steven D'Aprano wrote: > > > On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote: > > > > > >> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote: > > >> > > >>> Gah! That's twice I've screwed that u

Re: A few questiosn about encoding

2013-06-20 Thread Chris Angelico
On Thu, Jun 20, 2013 at 11:40 PM, Rick Johnson wrote: > Your generalization is analogous to explaining web browsers > as: "software that allows a user to view web pages in the > range www.*" Do you think someone could implement a web > browser from such limited specification? (if that was all > th

Re: A few questiosn about encoding

2013-06-20 Thread Chris Angelico
On Fri, Jun 21, 2013 at 1:12 AM, Rick Johnson wrote: > On Thursday, June 20, 2013 9:04:50 AM UTC-5, Andrew Berg wrote: >> On 2013.06.20 08:40, Rick Johnson wrote: > >> > then what is the purpose of a Unicode Braille character set? >> Two dimensional characters can be made into 3 dimensional sh

Re: A few questiosn about encoding

2013-06-20 Thread Rick Johnson
On Thursday, June 20, 2013 9:04:50 AM UTC-5, Andrew Berg wrote: > On 2013.06.20 08:40, Rick Johnson wrote: > > then what is the purpose of a Unicode Braille character set? > Two dimensional characters can be made into 3 dimensional shapes. Yes in the real world. But what about on your compute

Re: A few questiosn about encoding

2013-06-20 Thread Andrew Berg
On 2013.06.20 08:40, Rick Johnson wrote: > One the most humorous aspects of Unicode is that it has > encodings for Braille characters. Hmm, this presents a > conundrum of sorts. RIDDLE ME THIS?! > > Since Braille is a type of "reading" for the blind by > utilizing the sense of touch (there

Re: A few questiosn about encoding

2013-06-20 Thread Rick Johnson
On Thursday, June 20, 2013 1:26:17 AM UTC-5, Steven D'Aprano wrote: > The *implementation* is easy to explain. It's the names of > the encodings which I get tangled up in. Well, ignoring the fact that you're last explanation is still buggy, you have not actually described an "implementation", no,

Re: A few questiosn about encoding

2013-06-20 Thread MRAB
On 20/06/2013 07:26, Steven D'Aprano wrote: On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote: On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote: Gah! That's twice I've screwed that up. Sorry about that! Yeah, and your difficulty explaining the Unicode implementation r

Re: A few questiosn about encoding

2013-06-19 Thread Steven D'Aprano
On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote: > On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote: > >> Gah! That's twice I've screwed that up. Sorry about that! > > Yeah, and your difficulty explaining the Unicode implementation reminds > me of a passage from the Pyt

Re: A few questiosn about encoding

2013-06-19 Thread Rick Johnson
On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote: > Gah! That's twice I've screwed that up. > Sorry about that! Yeah, and your difficulty explaining the Unicode implementation reminds me of a passage from the Python zen: "If the implementation is hard to explain, it's a bad

Re: A few questiosn about encoding

2013-06-17 Thread Antoon Pardon
Op 17-06-13 09:08, Cameron Simpson schreef: > On 17Jun2013 08:49, Antoon Pardon wrote: > | Op 15-06-13 02:28, Cameron Simpson schreef: > | > On 14Jun2013 15:59, Nikos as SuperHost Support > wrote: > | > | So, a numeral = a string representation of a number. Is this correct? > | > > | > No, a num

Re: A few questiosn about encoding

2013-06-17 Thread Antoon Pardon
Op 17-06-13 09:08, Cameron Simpson schreef: > On 17Jun2013 08:49, Antoon Pardon wrote: > | Op 15-06-13 02:28, Cameron Simpson schreef: > | > On 14Jun2013 15:59, Nikos as SuperHost Support > wrote: > | > | So, a numeral = a string representation of a number. Is this correct? > | > > | > No, a num

Re: A few questiosn about encoding

2013-06-17 Thread Cameron Simpson
On 17Jun2013 08:49, Antoon Pardon wrote: | Op 15-06-13 02:28, Cameron Simpson schreef: | > On 14Jun2013 15:59, Nikos as SuperHost Support wrote: | > | So, a numeral = a string representation of a number. Is this correct? | > | > No, a numeral is an individual digit from the string representation

Re: A few questiosn about encoding

2013-06-16 Thread Antoon Pardon
Op 15-06-13 02:28, Cameron Simpson schreef: > On 14Jun2013 15:59, Nikos as SuperHost Support wrote: > | So, a numeral = a string representation of a number. Is this correct? > > No, a numeral is an individual digit from the string representation of a > number. > So: 65 requires two numerals: '6'

Re: A few questiosn about encoding

2013-06-16 Thread Chris “Kwpolska” Warrick
On Sat, Jun 15, 2013 at 10:35 PM, Benjamin Schollnick wrote: > Nick, > > The only thing that i didn't understood is this line. > First please tell me what is a byte value > > \x1b is a sequence you find inside strings (and "byte" strings, the > b'...' format). > > > \x1b is a character(ESC) repres

Re: A few questiosn about encoding

2013-06-15 Thread Benjamin Schollnick
Nick, >> The only thing that i didn't understood is this line. >> First please tell me what is a byte value >> >>> \x1b is a sequence you find inside strings (and "byte" strings, the >>> b'...' format). >> >> \x1b is a character(ESC) represented in hex format >> >> b'\x1b' is a byte object that

Re: A few questiosn about encoding

2013-06-15 Thread Nick the Gr33k
On 14/6/2013 4:58 μμ, Nick the Gr33k wrote: On 14/6/2013 1:14 μμ, Cameron Simpson wrote: Normally a character in a b'...' item represents the byte value matching the character's Unicode ordinal value. The only thing that i didn't understood is this line. First please tell me what is a byte val

Re: A few questiosn about encoding

2013-06-15 Thread Joel Goldstick
On Sat, Jun 15, 2013 at 11:14 AM, Nick the Gr33k wrote: > On 15/6/2013 5:59 μμ, Roy Smith wrote: > > And, yes, especially in networking, everybody talks about octets when >> they want to make sure people understand what they mean. >> > > 1 byte = 8 bits > > in networking though since we do not us

Re: A few questiosn about encoding

2013-06-15 Thread Steven D'Aprano
On Sat, 15 Jun 2013 17:49:13 +0300, Nick the Gr33k wrote: > What the difference between a byte and a byte's value? Nothing. -- Steven -- http://mail.python.org/mailman/listinfo/python-list

Re: A few questiosn about encoding

2013-06-15 Thread Nick the Gr33k
On 15/6/2013 5:59 μμ, Roy Smith wrote: And, yes, especially in networking, everybody talks about octets when they want to make sure people understand what they mean. 1 byte = 8 bits in networking though since we do not use encoding schemes with variable lengths like utf-8 is, how do we separ

Re: A few questiosn about encoding

2013-06-15 Thread Roy Smith
In article , Grant Edwards wrote: > There is some ambiguity in the term "byte". It used to mean the > smallest addressable unit of memory (which varied in the past -- at > one point, both 20 and 60 bit "bytes" were common). I would have defined it more like, "some arbitrary collection of adja

Re: A few questiosn about encoding

2013-06-15 Thread Nick the Gr33k
On 15/6/2013 5:44 μμ, Grant Edwards wrote: There is some ambiguity in the term "byte". It used to mean the smallest addressable unit of memory (which varied in the past -- at one point, both 20 and 60 bit "bytes" were common). These days the smallest addressable unit of memory is almost always

Re: A few questiosn about encoding

2013-06-15 Thread Grant Edwards
On 2013-06-15, Denis McMahon wrote: > On Fri, 14 Jun 2013 16:58:20 +0300, Nick the Gr33k wrote: > >> On 14/6/2013 1:14 , Cameron Simpson wrote: >>> Normally a character in a b'...' item represents the byte value >>> matching the character's Unicode ordinal value. > >> The only thing that i did

Re: A few questiosn about encoding

2013-06-14 Thread Denis McMahon
On Fri, 14 Jun 2013 16:58:20 +0300, Nick the Gr33k wrote: > On 14/6/2013 1:14 μμ, Cameron Simpson wrote: >> Normally a character in a b'...' item represents the byte value >> matching the character's Unicode ordinal value. > The only thing that i didn't understood is this line. > First please tel

Re: A few questiosn about encoding

2013-06-14 Thread Cameron Simpson
On 14Jun2013 16:58, Nikos as SuperHost Support wrote: | On 14/6/2013 1:14 μμ, Cameron Simpson wrote: | >Normally a character in a b'...' item represents the byte value | >matching the character's Unicode ordinal value. | | The only thing that i didn't understood is this line. | First please tell

Re: A few questiosn about encoding

2013-06-14 Thread Cameron Simpson
On 14Jun2013 15:59, Nikos as SuperHost Support wrote: | So, a numeral = a string representation of a number. Is this correct? No, a numeral is an individual digit from the string representation of a number. So: 65 requires two numerals: '6' and '5'. -- Cameron Simpson In life, you should alway

Re: A few questiosn about encoding

2013-06-14 Thread Walter Hurry
On Sat, 15 Jun 2013 03:03:02 +1000, Chris Angelico wrote: > Why do you sell web hosting services when you > have no clue how to provide them? > And why do you continue responding to this timewaster? Please, please just killfile him and let's all move on. -- http://mail.python.org/mailman/listin

Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

2013-06-14 Thread Grant Edwards
On 2013-06-14, Chris Angelico wrote: > On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain wrote: >> The answer is to always make sure that you include the previous poster >> in the reply as a Cc or To. I filter out any email that has the string >> "supp...@superhost.gr" in a header so I would als

Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

2013-06-14 Thread Chris Angelico
On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain wrote: > The answer is to always make sure that you include the previous poster > in the reply as a Cc or To. I filter out any email that has the string > "supp...@superhost.gr" in a header so I would also filter out the > replies if people would

Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

2013-06-14 Thread D'Arcy J.M. Cain
On Fri, 14 Jun 2013 11:06:55 +0200 Heiko Wundram wrote: > Come on now, this is _so_ obviously trolling, it's not even remotely > funny anymore. Why doesn't killfiling work with the mailing list > version of the python list? :-( A big problem, other than Mr. Support's shenanigans with his email a

Re: A few questiosn about encoding

2013-06-14 Thread Chris Angelico
On Sat, Jun 15, 2013 at 1:26 AM, Nick the Gr33k wrote: > Well, my biggest successes up until now where to build 3 websites utilizing > database saves and retrievals > > in PHP > in Perl > and later in Python > > with absolute ignorance of > > Apache Configuration: > CGI: > Linux: > > with just bas

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 6:21 μμ, Joel Goldstick wrote: let's cut to the chase and start with telling us what you DO know Nick. That would take less typing Well, my biggest successes up until now where to build 3 websites utilizing database saves and retrievals in PHP in Perl and later in Python with abs

Re: A few questiosn about encoding

2013-06-14 Thread Joel Goldstick
let's cut to the chase and start with telling us what you DO know Nick. That would take less typing On Fri, Jun 14, 2013 at 9:58 AM, Nick the Gr33k wrote: > On 14/6/2013 1:14 μμ, Cameron Simpson wrote: > >> Normally a character in a b'...' item represents the byte value >> matching the character

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 1:14 μμ, Cameron Simpson wrote: Normally a character in a b'...' item represents the byte value matching the character's Unicode ordinal value. The only thing that i didn't understood is this line. First please tell me what is a byte value \x1b is a sequence you find inside strin

Re: A few questiosn about encoding

2013-06-14 Thread Antoon Pardon
Op 14-06-13 14:59, Nick the Gr33k schreef: > On 14/6/2013 1:50 μμ, Antoon Pardon wrote: >> Python works with numbers, but at the moment >> it has to display such a number it has to produce something >> that is printable. So it will build a string that can be >> used as a notation for that number,

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 1:50 μμ, Antoon Pardon wrote: Python works with numbers, but at the moment it has to display such a number it has to produce something that is printable. So it will build a string that can be used as a notation for that number, a numeral. And that is what will be displayed. so a n

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 1:19 μμ, Cameron Simpson wrote: On 14Jun2013 11:37, Nikos as SuperHost Support wrote: | On 14/6/2013 11:22 πμ, Antoon Pardon wrote: | | >>Python prints numbers: | >No it doesn't, numbers are abstract concepts that can be represented in | >various notations, these notations are strin

Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

2013-06-14 Thread rusi
On Jun 14, 3:20 pm, Fábio Santos wrote: > > Come on now, this is _so_ obviously trolling, it's not even remotely > > funny anymore. Why doesn't killfiling work with the mailing list version of > the python list? :-( > > I have skimmed the archives for this month, and I estimate that a third of > t

Re: A few questiosn about encoding

2013-06-14 Thread Antoon Pardon
Op 14-06-13 10:37, Nick the Gr33k schreef: > On 14/6/2013 11:22 πμ, Antoon Pardon wrote: > >>> Python prints numbers: >> No it doesn't, numbers are abstract concepts that can be represented in >> various notations, these notations are strings. Those notaional strings >> end up being printed. As I s

Re: A few questiosn about encoding

2013-06-14 Thread Cameron Simpson
On 14Jun2013 09:59, Nikos as SuperHost Support wrote: | On 14/6/2013 4:00 πμ, Cameron Simpson wrote: | >On 13Jun2013 17:19, Nikos as SuperHost Support wrote: | >| A code-point and the code-point's ordinal value are associated into | >| a Unicode charset. They have the so called 1:1 mapping. | >|

Re: Don't feed the troll... (was: Re: A few questiosn about encoding)

2013-06-14 Thread Fábio Santos
On 14 Jun 2013 10:20, "Heiko Wundram" wrote: > > Am 14.06.2013 10:37, schrieb Nick the Gr33k: >> >> So everything we see like: >> >> 16474 >> nikos >> abc123 >> >> everything is a string and nothing is a number? not even number 1? > > > Come on now, this is _so_ obviously trolling, it's not even r

Re: A few questiosn about encoding

2013-06-14 Thread Cameron Simpson
On 14Jun2013 11:37, Nikos as SuperHost Support wrote: | On 14/6/2013 11:22 πμ, Antoon Pardon wrote: | | >>Python prints numbers: | >No it doesn't, numbers are abstract concepts that can be represented in | >various notations, these notations are strings. Those notaional strings | >end up being pr

Don't feed the troll... (was: Re: A few questiosn about encoding)

2013-06-14 Thread Heiko Wundram
Am 14.06.2013 10:37, schrieb Nick the Gr33k: So everything we see like: 16474 nikos abc123 everything is a string and nothing is a number? not even number 1? Come on now, this is _so_ obviously trolling, it's not even remotely funny anymore. Why doesn't killfiling work with the mailing list

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 11:22 πμ, Antoon Pardon wrote: Python prints numbers: No it doesn't, numbers are abstract concepts that can be represented in various notations, these notations are strings. Those notaional strings end up being printed. As I said before we are so used in using the decimal notation

Re: A few questiosn about encoding

2013-06-14 Thread Antoon Pardon
Op 14-06-13 09:49, Nick the Gr33k schreef: > On 14/6/2013 10:36 πμ, Antoon Pardon wrote: >> Op 13-06-13 10:08, Νικόλαος Κούρας schreef: >>> >>> Indeed python embraced it in single quoting '0b10001011010' and >>> not as 0b10001011010 which in fact makes it a string. >>> >>> But since bin(164

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 10:36 πμ, Antoon Pardon wrote: Op 13-06-13 10:08, Νικόλαος Κούρας schreef: On 13/6/2013 10:58 πμ, Chris Angelico wrote: On Thu, Jun 13, 2013 at 5:42 PM, �� wrote: On 13/6/2013 10:11 ��, Steven D'Aprano wrote: No! That creates a string from 16474 in base two: '0b1000

Re: A few questiosn about encoding

2013-06-14 Thread Antoon Pardon
Op 13-06-13 10:08, Νικόλαος Κούρας schreef: > On 13/6/2013 10:58 πμ, Chris Angelico wrote: >> On Thu, Jun 13, 2013 at 5:42 PM, �� >> wrote: >>> On 13/6/2013 10:11 ��, Steven D'Aprano wrote: No! That creates a string from 16474 in base two: '0b10001011010' >>> >>> I disag

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 9:00 πμ, Zero Piraeus wrote: : On 14 June 2013 01:34, Nick the Gr33k wrote: Why doesn't it work like this? leading 0 = 1 byte flag leading 1 = 2 bytes flag leading 00 = 3 bytes flag leading 01 = 4 bytes flag leading 10 = 5 bytes flag leading 11 = 6 bytes flag Wouldn't it be more

Re: A few questiosn about encoding

2013-06-14 Thread Nick the Gr33k
On 14/6/2013 4:00 πμ, Cameron Simpson wrote: On 13Jun2013 17:19, Nikos as SuperHost Support wrote: | A code-point and the code-point's ordinal value are associated into | a Unicode charset. They have the so called 1:1 mapping. | | So, i was under the impression that by encoding the code-point in

Re: A few questiosn about encoding

2013-06-13 Thread Zero Piraeus
: On 14 June 2013 01:34, Nick the Gr33k wrote: > Why doesn't it work like this? > > leading 0 = 1 byte flag > leading 1 = 2 bytes flag > leading 00 = 3 bytes flag > leading 01 = 4 bytes flag > leading 10 = 5 bytes flag > leading 11 = 6 bytes flag > > Wouldn't it be more logical? Think about it.

Re: A few questiosn about encoding

2013-06-13 Thread Nick the Gr33k
On 14/6/2013 1:46 πμ, Dennis Lee Bieber wrote: On Wed, 12 Jun 2013 09:09:05 + (UTC), ?? declaimed the following: (*) infact UTF8 also indicates the end of each character Up to a point. The initial byte encodes the length and the top few bits, but the subsequent octets aren

Re: A few questiosn about encoding

2013-06-13 Thread Cameron Simpson
On 13Jun2013 17:19, Nikos as SuperHost Support wrote: | A code-point and the code-point's ordinal value are associated into | a Unicode charset. They have the so called 1:1 mapping. | | So, i was under the impression that by encoding the code-point into | utf-8 was the same as encoding the code-p

Re: A few questiosn about encoding

2013-06-13 Thread Νικόλαος Κούρας
On 13/6/2013 2:49 μμ, Steven D'Aprano wrote: Please confirm these are true statement: A code-point and the code-point's ordinal value are associated into a Unicode charset. They have the so called 1:1 mapping. So, i was under the impression that by encoding the code-point into utf-8 was the

Re: A few questiosn about encoding

2013-06-13 Thread Steven D'Aprano
On Thu, 13 Jun 2013 12:41:41 +0300, Νικόλαος Κούρας wrote: >> In Python 2: > 16474 > typing 16474 in interactive session both in python 2 and 3 gives back > the number 16474 > > while we want the the binary representation of the number 16474 Python does not work that way. Ints *always* displ

Re: A few questiosn about encoding

2013-06-13 Thread Nobody
On Thu, 13 Jun 2013 12:01:55 +1000, Chris Angelico wrote: > On Thu, Jun 13, 2013 at 11:40 AM, Steven D'Aprano > wrote: >> The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but >> that's not UTF-8, that's UTF-8-plus-extra-codepoints. > > And a proper UTF-8 decoder will reject "\

Re: A few questiosn about encoding

2013-06-13 Thread Νικόλαος Κούρας
On 13/6/2013 11:20 πμ, Chris Angelico wrote: On Thu, Jun 13, 2013 at 6:08 PM, Νικόλαος Κούρας wrote: On 13/6/2013 10:58 πμ, Chris Angelico wrote: On Thu, Jun 13, 2013 at 5:42 PM, �� wrote: On 13/6/2013 10:11 ��, Steven D'Aprano wrote: No! That creates a string from 16474 in

Re: A few questiosn about encoding

2013-06-13 Thread Chris Angelico
On Thu, Jun 13, 2013 at 6:08 PM, Νικόλαος Κούρας wrote: > On 13/6/2013 10:58 πμ, Chris Angelico wrote: >> >> On Thu, Jun 13, 2013 at 5:42 PM, �� >> wrote: >> >>> On 13/6/2013 10:11 ��, Steven D'Aprano wrote: No! That creates a string from 16474 in base two: '0b1000

Re: A few questiosn about encoding

2013-06-13 Thread Νικόλαος Κούρας
On 13/6/2013 10:58 πμ, Chris Angelico wrote: On Thu, Jun 13, 2013 at 5:42 PM, �� wrote: On 13/6/2013 10:11 ��, Steven D'Aprano wrote: No! That creates a string from 16474 in base two: '0b10001011010' I disagree here. 16474 is a number in base 10. Doing bin(16474) we get the

Re: A few questiosn about encoding

2013-06-13 Thread Chris Angelico
On Thu, Jun 13, 2013 at 5:42 PM, Νικόλαος Κούρας wrote: > On 13/6/2013 10:11 πμ, Steven D'Aprano wrote: >> No! That creates a string from 16474 in base two: >> '0b10001011010' > > I disagree here. > 16474 is a number in base 10. Doing bin(16474) we get the binary > representation of number 164

Re: A few questiosn about encoding

2013-06-13 Thread Νικόλαος Κούρας
On 13/6/2013 10:11 πμ, Steven D'Aprano wrote: >>> chr(16474) '䁚' Some Chinese symbol. So code-point '䁚' has a Unicode ordinal value of 16474, correct? Correct. where in after encoding this glyph's ordinal value to binary gives us the following bytes: >>> bin(16474).encode('utf-8') b'0

Re: A few questiosn about encoding

2013-06-13 Thread Steven D'Aprano
On Thu, 13 Jun 2013 09:09:19 +0300, Νικόλαος Κούρας wrote: > On 13/6/2013 3:13 πμ, Steven D'Aprano wrote: >> Open an interactive Python session, and run this code: >> >> c = ord(16474) >> len(c.encode('utf-8')) >> >> >> That will tell you how many bytes are used for that example. > This si actual

Re: A few questiosn about encoding

2013-06-12 Thread Chris Angelico
On Thu, Jun 13, 2013 at 4:21 PM, Νικόλαος Κούρας wrote: > How can you be able to tell up to what character utf-8 needs 1 byte or 2 > bytes or 3? You look up Wikipedia, using the handy links that have been put to you MULTIPLE TIMES. ChrisA -- http://mail.python.org/mailman/listinfo/python-list

Re: A few questiosn about encoding

2013-06-12 Thread jmfauth
-- UTF-8, Unicode (consortium): 1 to 4 *Unicode Transformation Unit* UTF-8, ISO 10646: 1 to 6 *Unicode Transformation Unit* (still actual, unless tealy freshly modified) jmf -- http://mail.python.org/mailman/listinfo/python-list

Re: A few questiosn about encoding

2013-06-12 Thread Νικόλαος Κούρας
On 12/6/2013 11:30 μμ, Nobody wrote: On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote: So, how many bytes does UTF-8 stored for codepoints > 127 ? U+..U+007F 1 byte U+0080..U+07FF 2 bytes U+0800..U+ 3 bytes =U+1 4 bytes 'U' stands for Unicode code-point which

Re: A few questiosn about encoding

2013-06-12 Thread Νικόλαος Κούρας
On 13/6/2013 3:13 πμ, Steven D'Aprano wrote: On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote: So, how many bytes does UTF-8 stored for codepoints > 127 ? Two, three or four, depending on the codepoint. The amount of bytes needed by UTF-8 to store a code-point(character), depends

Re: A few questiosn about encoding

2013-06-12 Thread Chris Angelico
On Thu, Jun 13, 2013 at 11:40 AM, Steven D'Aprano wrote: > The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but > that's not UTF-8, that's UTF-8-plus-extra-codepoints. And a proper UTF-8 decoder will reject "\xC0\x80" and "\xed\xa0\x80", even though mathematically they would tr

Re: A few questiosn about encoding

2013-06-12 Thread Steven D'Aprano
On Wed, 12 Jun 2013 21:30:23 +0100, Nobody wrote: > The mechanism used by UTF-8 allows sequences of up to 6 bytes, for a > total of 31 bits, but UTF-16 is limited to U+10 (slightly more than > 20 bits). Same with UTF-8 and UTF-32, both of which are limited to U+10 because that is what Un

Re: A few questiosn about encoding

2013-06-12 Thread Steven D'Aprano
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote: > So, how many bytes does UTF-8 stored for codepoints > 127 ? Two, three or four, depending on the codepoint. > example for codepoint 256, 1345, 16474 ? You can do this yourself. I have already given you enough information in previous

Re: A few questiosn about encoding

2013-06-12 Thread Nobody
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote: > So, how many bytes does UTF-8 stored for codepoints > 127 ? U+..U+007F 1 byte U+0080..U+07FF 2 bytes U+0800..U+ 3 bytes >=U+1 4 bytes So, 1 byte for ASCII, 2 bytes for other Latin characters, Greek, Cyrillic, Arabi

Re: A few questiosn about encoding

2013-06-12 Thread Ulrich Eckhardt
Am 12.06.2013 13:23, schrieb Νικόλαος Κούρας: So, how many bytes does UTF-8 stored for codepoints > 127 ? What has your research turned up? I personally consider it lazy and respectless to get lots of pointers that you could use for further research and ask for more info before you even follo

Re: A few questiosn about encoding

2013-06-12 Thread Dave Angel
On 06/12/2013 05:24 AM, Steven D'Aprano wrote: On Wed, 12 Jun 2013 09:09:05 +, Νικόλαος Κούρας wrote: Isn't 14 bits way to many to store a character ? No. There are 1114111 possible characters in Unicode. (And in Japan, they sometimes use TRON instead of Unicode, which has even more.) I

Re: A few questiosn about encoding

2013-06-12 Thread Νικόλαος Κούρας
On 12/6/2013 12:24 μμ, Steven D'Aprano wrote: On Wed, 12 Jun 2013 09:09:05 +, Νικόλαος Κούρας wrote: Isn't 14 bits way to many to store a character ? No. There are 1114111 possible characters in Unicode. (And in Japan, they sometimes use TRON instead of Unicode, which has even more.) If

Re: A few questiosn about encoding

2013-06-12 Thread Steven D'Aprano
On Wed, 12 Jun 2013 09:09:05 +, Νικόλαος Κούρας wrote: > Isn't 14 bits way to many to store a character ? No. There are 1114111 possible characters in Unicode. (And in Japan, they sometimes use TRON instead of Unicode, which has even more.) If you list out all the combinations of 14 bits:

Re: A few questiosn about encoding

2013-06-12 Thread Νικόλαος Κούρας
>> (*) infact UTF8 also indicates the end of each character > Up to a point. The initial byte encodes the length and the top few > bits, but the subsequent octets aren’t distinguishable as final in > isolation. 0x80-0xBF can all be either medial or final. So, the first high-bits are a directiv

Re: A few questiosn about encoding

2013-06-09 Thread Chris “Kwpolska” Warrick
On Sun, Jun 9, 2013 at 12:44 PM, Νικόλαος Κούρας wrote: > A few questiosn about encoding please: > >>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for >>> values up to 256? > >>Because then how do you tell when you need one byte, and when you need >>two? If you read two bytes,

Re: A few questiosn about encoding

2013-06-09 Thread Nobody
On Sun, 09 Jun 2013 03:44:57 -0700, Νικόλαος Κούρας wrote: >>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for >>> values up to 256? > >>Because then how do you tell when you need one byte, and when you need >>two? If you read two bytes, and see 0x4C 0xFA, does that mean tw

Re: A few questiosn about encoding

2013-06-09 Thread Fábio Santos
On 9 Jun 2013 11:49, "Νικόλαος Κούρας" wrote: > > A few questiosn about encoding please: > > >> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for > >> values up to 256? > > >Because then how do you tell when you need one byte, and when you need > >two? If you read two bytes, and