Le dimanche 23 juin 2013 18:30:40 UTC+2, Steven D'Aprano a écrit :
> On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote:
>
>
>
> > utf-8: how many bytes to hold an "a" in memory? one byte.
>
> >
>
> > flexible string representation: how many bytes to hold an "a" in memory?
>
> > One byte? N
On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote:
> utf-8: how many bytes to hold an "a" in memory? one byte.
>
> flexible string representation: how many bytes to hold an "a" in memory?
> One byte? No, two. (Funny, it consumes more memory to hold an ascii char
> than ascii itself)
Incorrect.
Le jeudi 20 juin 2013 19:17:12 UTC+2, MRAB a écrit :
> On 20/06/2013 17:37, Chris Angelico wrote:
>
> > On Fri, Jun 21, 2013 at 2:27 AM, wrote:
>
> >> And all these coding schemes have something in common,
>
> >> they work all with a unique set of code points, more
>
> >> precisely a unique s
On 20/06/2013 17:27, wxjmfa...@gmail.com wrote:
Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit :
On 20/06/2013 07:26, Steven D'Aprano wrote:
On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
Gah! That's
Rick Johnson writes:
> On Thursday, June 20, 2013 9:04:50 AM UTC-5, Andrew Berg wrote:
> > On 2013.06.20 08:40, Rick Johnson wrote:
>
> > > then what is the purpose of a Unicode Braille character set?
> > Two dimensional characters can be made into 3 dimensional shapes.
>
> Yes in the real worl
On Fri, Jun 21, 2013 at 3:17 AM, MRAB wrote:
> On 20/06/2013 17:37, Chris Angelico wrote:
>>
>> On Fri, Jun 21, 2013 at 2:27 AM, wrote:
>>>
>>> And all these coding schemes have something in common,
>>> they work all with a unique set of code points, more
>>> precisely a unique set of encoded co
On 20/06/2013 17:37, Chris Angelico wrote:
On Fri, Jun 21, 2013 at 2:27 AM, wrote:
And all these coding schemes have something in common,
they work all with a unique set of code points, more
precisely a unique set of encoded code points (not
the set of implemented code points (byte)).
Just wh
Rick Johnson wrote:
>
> Since we're on the subject of Unicode:
>
>One the most humorous aspects of Unicode is that it has
>encodings for Braille characters. Hmm, this presents a
On Fri, Jun 21, 2013 at 2:27 AM, wrote:
> And all these coding schemes have something in common,
> they work all with a unique set of code points, more
> precisely a unique set of encoded code points (not
> the set of implemented code points (byte)).
>
> Just what the flexible string representati
Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit :
> On 20/06/2013 07:26, Steven D'Aprano wrote:
>
> > On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
>
> >
>
> >> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
>
> >>
>
> >>> Gah! That's twice I've screwed that u
On Thu, Jun 20, 2013 at 11:40 PM, Rick Johnson
wrote:
> Your generalization is analogous to explaining web browsers
> as: "software that allows a user to view web pages in the
> range www.*" Do you think someone could implement a web
> browser from such limited specification? (if that was all
> th
On Fri, Jun 21, 2013 at 1:12 AM, Rick Johnson
wrote:
> On Thursday, June 20, 2013 9:04:50 AM UTC-5, Andrew Berg wrote:
>> On 2013.06.20 08:40, Rick Johnson wrote:
>
>> > then what is the purpose of a Unicode Braille character set?
>> Two dimensional characters can be made into 3 dimensional sh
On Thursday, June 20, 2013 9:04:50 AM UTC-5, Andrew Berg wrote:
> On 2013.06.20 08:40, Rick Johnson wrote:
> > then what is the purpose of a Unicode Braille character set?
> Two dimensional characters can be made into 3 dimensional shapes.
Yes in the real world. But what about on your compute
On 2013.06.20 08:40, Rick Johnson wrote:
> One the most humorous aspects of Unicode is that it has
> encodings for Braille characters. Hmm, this presents a
> conundrum of sorts. RIDDLE ME THIS?!
>
> Since Braille is a type of "reading" for the blind by
> utilizing the sense of touch (there
On Thursday, June 20, 2013 1:26:17 AM UTC-5, Steven D'Aprano wrote:
> The *implementation* is easy to explain. It's the names of
> the encodings which I get tangled up in.
Well, ignoring the fact that you're last explanation is
still buggy, you have not actually described an
"implementation", no,
On 20/06/2013 07:26, Steven D'Aprano wrote:
On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
Gah! That's twice I've screwed that up. Sorry about that!
Yeah, and your difficulty explaining the Unicode implementation r
On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
>
>> Gah! That's twice I've screwed that up. Sorry about that!
>
> Yeah, and your difficulty explaining the Unicode implementation reminds
> me of a passage from the Pyt
On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
> Gah! That's twice I've screwed that up.
> Sorry about that!
Yeah, and your difficulty explaining the Unicode implementation reminds me of a
passage from the Python zen:
"If the implementation is hard to explain, it's a bad
Op 17-06-13 09:08, Cameron Simpson schreef:
> On 17Jun2013 08:49, Antoon Pardon wrote:
> | Op 15-06-13 02:28, Cameron Simpson schreef:
> | > On 14Jun2013 15:59, Nikos as SuperHost Support
> wrote:
> | > | So, a numeral = a string representation of a number. Is this correct?
> | >
> | > No, a num
Op 17-06-13 09:08, Cameron Simpson schreef:
> On 17Jun2013 08:49, Antoon Pardon wrote:
> | Op 15-06-13 02:28, Cameron Simpson schreef:
> | > On 14Jun2013 15:59, Nikos as SuperHost Support
> wrote:
> | > | So, a numeral = a string representation of a number. Is this correct?
> | >
> | > No, a num
On 17Jun2013 08:49, Antoon Pardon wrote:
| Op 15-06-13 02:28, Cameron Simpson schreef:
| > On 14Jun2013 15:59, Nikos as SuperHost Support wrote:
| > | So, a numeral = a string representation of a number. Is this correct?
| >
| > No, a numeral is an individual digit from the string representation
Op 15-06-13 02:28, Cameron Simpson schreef:
> On 14Jun2013 15:59, Nikos as SuperHost Support wrote:
> | So, a numeral = a string representation of a number. Is this correct?
>
> No, a numeral is an individual digit from the string representation of a
> number.
> So: 65 requires two numerals: '6'
On Sat, Jun 15, 2013 at 10:35 PM, Benjamin Schollnick
wrote:
> Nick,
>
> The only thing that i didn't understood is this line.
> First please tell me what is a byte value
>
> \x1b is a sequence you find inside strings (and "byte" strings, the
> b'...' format).
>
>
> \x1b is a character(ESC) repres
Nick,
>> The only thing that i didn't understood is this line.
>> First please tell me what is a byte value
>>
>>> \x1b is a sequence you find inside strings (and "byte" strings, the
>>> b'...' format).
>>
>> \x1b is a character(ESC) represented in hex format
>>
>> b'\x1b' is a byte object that
On 14/6/2013 4:58 μμ, Nick the Gr33k wrote:
On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
Normally a character in a b'...' item represents the byte value
matching the character's Unicode ordinal value.
The only thing that i didn't understood is this line.
First please tell me what is a byte val
On Sat, Jun 15, 2013 at 11:14 AM, Nick the Gr33k wrote:
> On 15/6/2013 5:59 μμ, Roy Smith wrote:
>
> And, yes, especially in networking, everybody talks about octets when
>> they want to make sure people understand what they mean.
>>
>
> 1 byte = 8 bits
>
> in networking though since we do not us
On Sat, 15 Jun 2013 17:49:13 +0300, Nick the Gr33k wrote:
> What the difference between a byte and a byte's value?
Nothing.
--
Steven
--
http://mail.python.org/mailman/listinfo/python-list
On 15/6/2013 5:59 μμ, Roy Smith wrote:
And, yes, especially in networking, everybody talks about octets when
they want to make sure people understand what they mean.
1 byte = 8 bits
in networking though since we do not use encoding schemes with variable
lengths like utf-8 is, how do we separ
In article ,
Grant Edwards wrote:
> There is some ambiguity in the term "byte". It used to mean the
> smallest addressable unit of memory (which varied in the past -- at
> one point, both 20 and 60 bit "bytes" were common).
I would have defined it more like, "some arbitrary collection of
adja
On 15/6/2013 5:44 μμ, Grant Edwards wrote:
There is some ambiguity in the term "byte". It used to mean the
smallest addressable unit of memory (which varied in the past -- at
one point, both 20 and 60 bit "bytes" were common). These days the
smallest addressable unit of memory is almost always
On 2013-06-15, Denis McMahon wrote:
> On Fri, 14 Jun 2013 16:58:20 +0300, Nick the Gr33k wrote:
>
>> On 14/6/2013 1:14 , Cameron Simpson wrote:
>>> Normally a character in a b'...' item represents the byte value
>>> matching the character's Unicode ordinal value.
>
>> The only thing that i did
On Fri, 14 Jun 2013 16:58:20 +0300, Nick the Gr33k wrote:
> On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
>> Normally a character in a b'...' item represents the byte value
>> matching the character's Unicode ordinal value.
> The only thing that i didn't understood is this line.
> First please tel
On 14Jun2013 16:58, Nikos as SuperHost Support wrote:
| On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
| >Normally a character in a b'...' item represents the byte value
| >matching the character's Unicode ordinal value.
|
| The only thing that i didn't understood is this line.
| First please tell
On 14Jun2013 15:59, Nikos as SuperHost Support wrote:
| So, a numeral = a string representation of a number. Is this correct?
No, a numeral is an individual digit from the string representation of a number.
So: 65 requires two numerals: '6' and '5'.
--
Cameron Simpson
In life, you should alway
On Sat, 15 Jun 2013 03:03:02 +1000, Chris Angelico wrote:
> Why do you sell web hosting services when you
> have no clue how to provide them?
>
And why do you continue responding to this timewaster? Please, please
just killfile him and let's all move on.
--
http://mail.python.org/mailman/listin
On 2013-06-14, Chris Angelico wrote:
> On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain wrote:
>> The answer is to always make sure that you include the previous poster
>> in the reply as a Cc or To. I filter out any email that has the string
>> "supp...@superhost.gr" in a header so I would als
On Sat, Jun 15, 2013 at 3:13 AM, D'Arcy J.M. Cain wrote:
> The answer is to always make sure that you include the previous poster
> in the reply as a Cc or To. I filter out any email that has the string
> "supp...@superhost.gr" in a header so I would also filter out the
> replies if people would
On Fri, 14 Jun 2013 11:06:55 +0200
Heiko Wundram wrote:
> Come on now, this is _so_ obviously trolling, it's not even remotely
> funny anymore. Why doesn't killfiling work with the mailing list
> version of the python list? :-(
A big problem, other than Mr. Support's shenanigans with his email
a
On Sat, Jun 15, 2013 at 1:26 AM, Nick the Gr33k wrote:
> Well, my biggest successes up until now where to build 3 websites utilizing
> database saves and retrievals
>
> in PHP
> in Perl
> and later in Python
>
> with absolute ignorance of
>
> Apache Configuration:
> CGI:
> Linux:
>
> with just bas
On 14/6/2013 6:21 μμ, Joel Goldstick wrote:
let's cut to the chase and start with telling us what you DO know Nick.
That would take less typing
Well, my biggest successes up until now where to build 3 websites
utilizing database saves and retrievals
in PHP
in Perl
and later in Python
with abs
let's cut to the chase and start with telling us what you DO know Nick.
That would take less typing
On Fri, Jun 14, 2013 at 9:58 AM, Nick the Gr33k wrote:
> On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
>
>> Normally a character in a b'...' item represents the byte value
>> matching the character
On 14/6/2013 1:14 μμ, Cameron Simpson wrote:
Normally a character in a b'...' item represents the byte value
matching the character's Unicode ordinal value.
The only thing that i didn't understood is this line.
First please tell me what is a byte value
\x1b is a sequence you find inside strin
Op 14-06-13 14:59, Nick the Gr33k schreef:
> On 14/6/2013 1:50 μμ, Antoon Pardon wrote:
>> Python works with numbers, but at the moment
>> it has to display such a number it has to produce something
>> that is printable. So it will build a string that can be
>> used as a notation for that number,
On 14/6/2013 1:50 μμ, Antoon Pardon wrote:
Python works with numbers, but at the moment
it has to display such a number it has to produce something
that is printable. So it will build a string that can be
used as a notation for that number, a numeral. And that
is what will be displayed.
so a n
On 14/6/2013 1:19 μμ, Cameron Simpson wrote:
On 14Jun2013 11:37, Nikos as SuperHost Support wrote:
| On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
|
| >>Python prints numbers:
| >No it doesn't, numbers are abstract concepts that can be represented in
| >various notations, these notations are strin
On Jun 14, 3:20 pm, Fábio Santos wrote:
> > Come on now, this is _so_ obviously trolling, it's not even remotely
>
> funny anymore. Why doesn't killfiling work with the mailing list version of
> the python list? :-(
>
> I have skimmed the archives for this month, and I estimate that a third of
> t
Op 14-06-13 10:37, Nick the Gr33k schreef:
> On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
>
>>> Python prints numbers:
>> No it doesn't, numbers are abstract concepts that can be represented in
>> various notations, these notations are strings. Those notaional strings
>> end up being printed. As I s
On 14Jun2013 09:59, Nikos as SuperHost Support wrote:
| On 14/6/2013 4:00 πμ, Cameron Simpson wrote:
| >On 13Jun2013 17:19, Nikos as SuperHost Support wrote:
| >| A code-point and the code-point's ordinal value are associated into
| >| a Unicode charset. They have the so called 1:1 mapping.
| >|
On 14 Jun 2013 10:20, "Heiko Wundram" wrote:
>
> Am 14.06.2013 10:37, schrieb Nick the Gr33k:
>>
>> So everything we see like:
>>
>> 16474
>> nikos
>> abc123
>>
>> everything is a string and nothing is a number? not even number 1?
>
>
> Come on now, this is _so_ obviously trolling, it's not even r
On 14Jun2013 11:37, Nikos as SuperHost Support wrote:
| On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
|
| >>Python prints numbers:
| >No it doesn't, numbers are abstract concepts that can be represented in
| >various notations, these notations are strings. Those notaional strings
| >end up being pr
Am 14.06.2013 10:37, schrieb Nick the Gr33k:
So everything we see like:
16474
nikos
abc123
everything is a string and nothing is a number? not even number 1?
Come on now, this is _so_ obviously trolling, it's not even remotely
funny anymore. Why doesn't killfiling work with the mailing list
On 14/6/2013 11:22 πμ, Antoon Pardon wrote:
Python prints numbers:
No it doesn't, numbers are abstract concepts that can be represented in
various notations, these notations are strings. Those notaional strings
end up being printed. As I said before we are so used in using the
decimal notation
Op 14-06-13 09:49, Nick the Gr33k schreef:
> On 14/6/2013 10:36 πμ, Antoon Pardon wrote:
>> Op 13-06-13 10:08, Νικόλαος Κούρας schreef:
>>>
>>> Indeed python embraced it in single quoting '0b10001011010' and
>>> not as 0b10001011010 which in fact makes it a string.
>>>
>>> But since bin(164
On 14/6/2013 10:36 πμ, Antoon Pardon wrote:
Op 13-06-13 10:08, Νικόλαος Κούρας schreef:
On 13/6/2013 10:58 πμ, Chris Angelico wrote:
On Thu, Jun 13, 2013 at 5:42 PM, ��
wrote:
On 13/6/2013 10:11 ��, Steven D'Aprano wrote:
No! That creates a string from 16474 in base two:
'0b1000
Op 13-06-13 10:08, Νικόλαος Κούρας schreef:
> On 13/6/2013 10:58 πμ, Chris Angelico wrote:
>> On Thu, Jun 13, 2013 at 5:42 PM, ��
>> wrote:
>>> On 13/6/2013 10:11 ��, Steven D'Aprano wrote:
No! That creates a string from 16474 in base two:
'0b10001011010'
>>>
>>> I disag
On 14/6/2013 9:00 πμ, Zero Piraeus wrote:
:
On 14 June 2013 01:34, Nick the Gr33k wrote:
Why doesn't it work like this?
leading 0 = 1 byte flag
leading 1 = 2 bytes flag
leading 00 = 3 bytes flag
leading 01 = 4 bytes flag
leading 10 = 5 bytes flag
leading 11 = 6 bytes flag
Wouldn't it be more
On 14/6/2013 4:00 πμ, Cameron Simpson wrote:
On 13Jun2013 17:19, Nikos as SuperHost Support wrote:
| A code-point and the code-point's ordinal value are associated into
| a Unicode charset. They have the so called 1:1 mapping.
|
| So, i was under the impression that by encoding the code-point in
:
On 14 June 2013 01:34, Nick the Gr33k wrote:
> Why doesn't it work like this?
>
> leading 0 = 1 byte flag
> leading 1 = 2 bytes flag
> leading 00 = 3 bytes flag
> leading 01 = 4 bytes flag
> leading 10 = 5 bytes flag
> leading 11 = 6 bytes flag
>
> Wouldn't it be more logical?
Think about it.
On 14/6/2013 1:46 πμ, Dennis Lee Bieber wrote:
On Wed, 12 Jun 2013 09:09:05 + (UTC), ??
declaimed the following:
(*) infact UTF8 also indicates the end of each character
Up to a point. The initial byte encodes the length and the top few
bits, but the subsequent octets aren
On 13Jun2013 17:19, Nikos as SuperHost Support wrote:
| A code-point and the code-point's ordinal value are associated into
| a Unicode charset. They have the so called 1:1 mapping.
|
| So, i was under the impression that by encoding the code-point into
| utf-8 was the same as encoding the code-p
On 13/6/2013 2:49 μμ, Steven D'Aprano wrote:
Please confirm these are true statement:
A code-point and the code-point's ordinal value are associated into a
Unicode charset. They have the so called 1:1 mapping.
So, i was under the impression that by encoding the code-point into
utf-8 was the
On Thu, 13 Jun 2013 12:41:41 +0300, Νικόλαος Κούρας wrote:
>> In Python 2:
> 16474
> typing 16474 in interactive session both in python 2 and 3 gives back
> the number 16474
>
> while we want the the binary representation of the number 16474
Python does not work that way. Ints *always* displ
On Thu, 13 Jun 2013 12:01:55 +1000, Chris Angelico wrote:
> On Thu, Jun 13, 2013 at 11:40 AM, Steven D'Aprano
> wrote:
>> The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but
>> that's not UTF-8, that's UTF-8-plus-extra-codepoints.
>
> And a proper UTF-8 decoder will reject "\
On 13/6/2013 11:20 πμ, Chris Angelico wrote:
On Thu, Jun 13, 2013 at 6:08 PM, Νικόλαος Κούρας wrote:
On 13/6/2013 10:58 πμ, Chris Angelico wrote:
On Thu, Jun 13, 2013 at 5:42 PM, ��
wrote:
On 13/6/2013 10:11 ��, Steven D'Aprano wrote:
No! That creates a string from 16474 in
On Thu, Jun 13, 2013 at 6:08 PM, Νικόλαος Κούρας wrote:
> On 13/6/2013 10:58 πμ, Chris Angelico wrote:
>>
>> On Thu, Jun 13, 2013 at 5:42 PM, ��
>> wrote:
>>
>>> On 13/6/2013 10:11 ��, Steven D'Aprano wrote:
No! That creates a string from 16474 in base two:
'0b1000
On 13/6/2013 10:58 πμ, Chris Angelico wrote:
On Thu, Jun 13, 2013 at 5:42 PM, �� wrote:
On 13/6/2013 10:11 ��, Steven D'Aprano wrote:
No! That creates a string from 16474 in base two:
'0b10001011010'
I disagree here.
16474 is a number in base 10. Doing bin(16474) we get the
On Thu, Jun 13, 2013 at 5:42 PM, Νικόλαος Κούρας wrote:
> On 13/6/2013 10:11 πμ, Steven D'Aprano wrote:
>> No! That creates a string from 16474 in base two:
>> '0b10001011010'
>
> I disagree here.
> 16474 is a number in base 10. Doing bin(16474) we get the binary
> representation of number 164
On 13/6/2013 10:11 πμ, Steven D'Aprano wrote:
>>> chr(16474)
'䁚'
Some Chinese symbol.
So code-point '䁚' has a Unicode ordinal value of 16474, correct?
Correct.
where in after encoding this glyph's ordinal value to binary gives us
the following bytes:
>>> bin(16474).encode('utf-8')
b'0
On Thu, 13 Jun 2013 09:09:19 +0300, Νικόλαος Κούρας wrote:
> On 13/6/2013 3:13 πμ, Steven D'Aprano wrote:
>> Open an interactive Python session, and run this code:
>>
>> c = ord(16474)
>> len(c.encode('utf-8'))
>>
>>
>> That will tell you how many bytes are used for that example.
> This si actual
On Thu, Jun 13, 2013 at 4:21 PM, Νικόλαος Κούρας wrote:
> How can you be able to tell up to what character utf-8 needs 1 byte or 2
> bytes or 3?
You look up Wikipedia, using the handy links that have been put to you
MULTIPLE TIMES.
ChrisA
--
http://mail.python.org/mailman/listinfo/python-list
--
UTF-8, Unicode (consortium): 1 to 4 *Unicode Transformation Unit*
UTF-8, ISO 10646: 1 to 6 *Unicode Transformation Unit*
(still actual, unless tealy freshly modified)
jmf
--
http://mail.python.org/mailman/listinfo/python-list
On 12/6/2013 11:30 μμ, Nobody wrote:
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:
So, how many bytes does UTF-8 stored for codepoints > 127 ?
U+..U+007F 1 byte
U+0080..U+07FF 2 bytes
U+0800..U+ 3 bytes
=U+1 4 bytes
'U' stands for Unicode code-point which
On 13/6/2013 3:13 πμ, Steven D'Aprano wrote:
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:
So, how many bytes does UTF-8 stored for codepoints > 127 ?
Two, three or four, depending on the codepoint.
The amount of bytes needed by UTF-8 to store a code-point(character),
depends
On Thu, Jun 13, 2013 at 11:40 AM, Steven D'Aprano
wrote:
> The *mechanism* of UTF-8 can go up to 6 bytes (or even 7 perhaps?), but
> that's not UTF-8, that's UTF-8-plus-extra-codepoints.
And a proper UTF-8 decoder will reject "\xC0\x80" and "\xed\xa0\x80",
even though mathematically they would tr
On Wed, 12 Jun 2013 21:30:23 +0100, Nobody wrote:
> The mechanism used by UTF-8 allows sequences of up to 6 bytes, for a
> total of 31 bits, but UTF-16 is limited to U+10 (slightly more than
> 20 bits).
Same with UTF-8 and UTF-32, both of which are limited to U+10 because
that is what Un
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:
> So, how many bytes does UTF-8 stored for codepoints > 127 ?
Two, three or four, depending on the codepoint.
> example for codepoint 256, 1345, 16474 ?
You can do this yourself. I have already given you enough information in
previous
On Wed, 12 Jun 2013 14:23:49 +0300, Νικόλαος Κούρας wrote:
> So, how many bytes does UTF-8 stored for codepoints > 127 ?
U+..U+007F 1 byte
U+0080..U+07FF 2 bytes
U+0800..U+ 3 bytes
>=U+1 4 bytes
So, 1 byte for ASCII, 2 bytes for other Latin characters, Greek, Cyrillic,
Arabi
Am 12.06.2013 13:23, schrieb Νικόλαος Κούρας:
So, how many bytes does UTF-8 stored for codepoints > 127 ?
What has your research turned up? I personally consider it lazy and
respectless to get lots of pointers that you could use for further
research and ask for more info before you even follo
On 06/12/2013 05:24 AM, Steven D'Aprano wrote:
On Wed, 12 Jun 2013 09:09:05 +, Νικόλαος Κούρας wrote:
Isn't 14 bits way to many to store a character ?
No.
There are 1114111 possible characters in Unicode. (And in Japan, they
sometimes use TRON instead of Unicode, which has even more.)
I
On 12/6/2013 12:24 μμ, Steven D'Aprano wrote:
On Wed, 12 Jun 2013 09:09:05 +, Νικόλαος Κούρας wrote:
Isn't 14 bits way to many to store a character ?
No.
There are 1114111 possible characters in Unicode. (And in Japan, they
sometimes use TRON instead of Unicode, which has even more.)
If
On Wed, 12 Jun 2013 09:09:05 +, Νικόλαος Κούρας wrote:
> Isn't 14 bits way to many to store a character ?
No.
There are 1114111 possible characters in Unicode. (And in Japan, they
sometimes use TRON instead of Unicode, which has even more.)
If you list out all the combinations of 14 bits:
>> (*) infact UTF8 also indicates the end of each character
> Up to a point. The initial byte encodes the length and the top few
> bits, but the subsequent octets aren’t distinguishable as final in
> isolation. 0x80-0xBF can all be either medial or final.
So, the first high-bits are a directiv
On Sun, Jun 9, 2013 at 12:44 PM, Νικόλαος Κούρας wrote:
> A few questiosn about encoding please:
>
>>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for
>>> values up to 256?
>
>>Because then how do you tell when you need one byte, and when you need
>>two? If you read two bytes,
On Sun, 09 Jun 2013 03:44:57 -0700, Νικόλαος Κούρας wrote:
>>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for
>>> values up to 256?
>
>>Because then how do you tell when you need one byte, and when you need
>>two? If you read two bytes, and see 0x4C 0xFA, does that mean tw
On 9 Jun 2013 11:49, "Νικόλαος Κούρας" wrote:
>
> A few questiosn about encoding please:
>
> >> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for
> >> values up to 256?
>
> >Because then how do you tell when you need one byte, and when you need
> >two? If you read two bytes, and
85 matches
Mail list logo