On Jan 19, 11:33 pm, Terry Reedy wrote:
> On 1/19/2011 1:02 PM, Tim Harig wrote:
>
> > Right, but I only have to do that once. After that, I can directly address
> > any piece of the stream that I choose. If I leave the information as a
> > simple UTF-8 stream, I would have to walk the stream ag
On 1/19/2011 1:02 PM, Tim Harig wrote:
Right, but I only have to do that once. After that, I can directly address
any piece of the stream that I choose. If I leave the information as a
simple UTF-8 stream, I would have to walk the stream again, I would have to
walk through the the first byte o
On Wed, 19 Jan 2011 19:18:49 + (UTC)
Tim Harig wrote:
> On 2011-01-19, Antoine Pitrou wrote:
> > On Wed, 19 Jan 2011 18:02:22 + (UTC)
> > Tim Harig wrote:
> >> Converting to a fixed byte
> >> representation (UTF-32/UCS-4) or separating all of the bytes for each
> >> UTF-8 into 6 byte con
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 18:02:22 + (UTC)
> Tim Harig wrote:
>> Converting to a fixed byte
>> representation (UTF-32/UCS-4) or separating all of the bytes for each
>> UTF-8 into 6 byte containers both make it possible to simply index the
>> letters by a const
On Wed, 19 Jan 2011 18:02:22 + (UTC)
Tim Harig wrote:
> On 2011-01-19, Antoine Pitrou wrote:
> > On Wed, 19 Jan 2011 16:03:11 + (UTC)
> > Tim Harig wrote:
> >>
> >> For many operations, it is just much faster and simpler to use a single
> >> character based container opposed to having t
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 16:03:11 + (UTC)
> Tim Harig wrote:
>>
>> For many operations, it is just much faster and simpler to use a single
>> character based container opposed to having to process an entire byte
>> stream to determine individual letters from
On Wed, 19 Jan 2011 16:03:11 + (UTC)
Tim Harig wrote:
>
> For many operations, it is just much faster and simpler to use a single
> character based container opposed to having to process an entire byte
> stream to determine individual letters from the bytes or to having
> adaptive size contai
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 14:00:13 + (UTC)
> Tim Harig wrote:
>> UTF-8 has no apparent endianess if you only store it as a byte stream.
>> It does however have a byte order. If you store it using multibytes
>> (six bytes for all UTF-8 possibilites) , which is
On 2011-01-19, Adam Skutt wrote:
> On Jan 19, 9:00 am, Tim Harig wrote:
>> That is why I say that byte streams are essentially big endian. It is
>> all a matter of how you look at it.
>
> It is nothing of the sort. Some byte streams are in fact, little
> endian: when the bytes are combined into
On Jan 19, 9:00 am, Tim Harig wrote:
>
> So, you can always assume a big-endian and things will work out correctly
> while you cannot always make the same assumption as little endian
> without potential issues. The same holds true for any byte stream data.
You need to spend some serious time pro
On Wed, 19 Jan 2011 14:00:13 + (UTC)
Tim Harig wrote:
>
> - Q: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If
> - yes, then can I still assume the remaining UTF-8 bytes are in big-endian
> ^^
> - or
Considering you post contained no information or evidence for your
negations, I shouldn't even bother responding. I will bite once.
Hopefully next time your arguments will contain some pith.
On 2011-01-19, Antoine Pitrou wrote:
> On Wed, 19 Jan 2011 11:34:53 + (UTC)
> Tim Harig wrote:
>> Th
On Wed, 19 Jan 2011 11:34:53 + (UTC)
Tim Harig wrote:
> That is why the FAQ I linked to
> says yes to the fact that you can consider UTF-8 to always be in big-endian
> order.
It certainly doesn't. Read better.
> Essentially all byte based data is big-endian.
This is pure nonsense.
--
htt
On 2011-01-19, Tim Roberts wrote:
> Tim Harig wrote:
>>On 2011-01-17, carlo wrote:
>>
>>> 2- If that were true, can you point me to some documentation about the
>>> math that, as Mark says, demonstrates this?
>>
>>It is true because UTF-8 is essentially an 8 bit encoding that resorts
>>to the ne
Tim Harig wrote:
>On 2011-01-17, carlo wrote:
>
>> 2- If that were true, can you point me to some documentation about the
>> math that, as Mark says, demonstrates this?
>
>It is true because UTF-8 is essentially an 8 bit encoding that resorts
>to the next bit once it exhausts the addressible spac
On Jan 17, 2:19 pm, carlo wrote:
> Hi,
> recently I had to study *seriously* Unicode and encodings for one
> project in Python but I left with a couple of doubts arised after
> reading the unicode chapter of Dive into Python 3 book by Mark
> Pilgrim.
>
> 1- Mark says:
> "Also (and you’ll have to t
On 17 Gen, 23:34, Antoine Pitrou wrote:
> On Mon, 17 Jan 2011 14:19:13 -0800 (PST)
>
> carlo wrote:
> > Is it true UTF-8 does not have any "big-endian/little-endian" issue
> > because of its encoding method?
>
> Yes.
>
> > And if it is true, why Mark (and
> > everyone does) writes about UTF-8 wit
On Mon, 17 Jan 2011 14:19:13 -0800 (PST)
carlo wrote:
> Is it true UTF-8 does not have any "big-endian/little-endian" issue
> because of its encoding method?
Yes.
> And if it is true, why Mark (and
> everyone does) writes about UTF-8 with and without BOM some chapters
> later? What would be the
On 2011-01-17, carlo wrote:
> Is it true UTF-8 does not have any "big-endian/little-endian" issue
> because of its encoding method? And if it is true, why Mark (and
> everyone does) writes about UTF-8 with and without BOM some chapters
> later? What would be the BOM purpose then?
Yes, it is true.
On 17.01.2011 23:19, carlo wrote:
Is it true UTF-8 does not have any "big-endian/little-endian" issue
because of its encoding method? And if it is true, why Mark (and
everyone does) writes about UTF-8 with and without BOM some chapters
later? What would be the BOM purpose then?
Can't answer yo
Hi,
recently I had to study *seriously* Unicode and encodings for one
project in Python but I left with a couple of doubts arised after
reading the unicode chapter of Dive into Python 3 book by Mark
Pilgrim.
1- Mark says:
"Also (and you’ll have to trust me on this, because I’m not going to
show yo
21 matches
Mail list logo