On 31/03/2013 08:35, jmfauth wrote:
--
Neil Hodgson:
"The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string."
Serious developers/typographers/users know that you can not compose
a text i
On Sun, 31 Mar 2013 00:35:23 -0700, jmfauth wrote:
> This is not really the problem. "Serious users" may notice sooner or
> later, Python and Unicode are walking in opposite directions
> (technically and in spirit).
>
timeit.repeat("'a' * 1000 + 'ẞ'")
> [1.1088995672090292, 1.08422666132619
--
Neil Hodgson:
"The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string."
Serious developers/typographers/users know that you can not compose
a text in French with "latin-1". This is now a
On 03/29/2013 02:26 PM, ru...@yahoo.com wrote:
On 03/28/2013 02:31 PM, Ethan Furman wrote:
On 03/28/2013 12:54 PM, ru...@yahoo.com wrote:
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
For someone who delights in pointing out the logical errors of
others you are often remarkably sloppy in your
On 03/28/2013 02:31 PM, Ethan Furman wrote:
> On 03/28/2013 12:54 PM, ru...@yahoo.com wrote:
>> On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
>> For someone who delights in pointing out the logical errors of
>> others you are often remarkably sloppy in your own logic.
>>
>> Of course language can
On 3/28/2013 10:37 PM, Steven D'Aprano wrote:
Under what circumstances will a string be created from a wchar_t string?
How, and why, would such a string be created? Why would Python still
support strings containing surrogates when it now has a nice, shiny,
surrogate-free flexible representation?
On 2013-03-29, Ethan Furman wrote:
> On 03/29/2013 07:52 AM, Grant Edwards wrote:
>> On 2013-03-28, Ethan Furman wrote:
>>
>>> I cannot speak for the borg mind, but for myself a troll is anyone
>>> who continually posts rants (such as RR & XL) or who continuously
>>> hijacks threads to talk about
On 03/29/2013 07:52 AM, Grant Edwards wrote:
On 2013-03-28, Ethan Furman wrote:
I cannot speak for the borg mind, but for myself a troll is anyone
who continually posts rants (such as RR & XL) or who continuously
hijacks threads to talk about their pet peeve (such as jmf).
Assuming jmf actua
On 2013-03-28, Ethan Furman wrote:
> I cannot speak for the borg mind, but for myself a troll is anyone
> who continually posts rants (such as RR & XL) or who continuously
> hijacks threads to talk about their pet peeve (such as jmf).
Assuming jmf actually does care deeply and genuinely about Un
On Fri, Mar 29, 2013 at 12:11 AM, Ian Kelly wrote:
> From the PEP:
>
> """
> A new function PyUnicode_AsUTF8 is provided to access the UTF-8
> representation. It is thus identical to the existing
> _PyUnicode_AsString, which is removed. The function will compute the
> utf8 representation when firs
On Thu, Mar 28, 2013 at 8:37 PM, Steven D'Aprano
wrote:
>>> I also wonder why the implementation bothers keeping a UTF-8
>>> representation. That sounds like premature optimization to me. Surely
>>> you only need it when writing to a file with UTF-8 encoding? For most
>>> strings, that will never
Chris Angelico:
But both this and your example of case conversion are, fundamentally,
iterating over the string. What if you aren't doing that? What if you
want to parse and process?
Parsing is also normally a scanning operation. If you want to
process pieces of the string based on the par
On Fri, Mar 29, 2013 at 2:34 PM, Neil Hodgson wrote:
>It doesn't horrify me - I've been working this way for over 10 years and
> it seems completely natural. You can wrap access in iterators that hide the
> byte offsets if you like. This then ensures that all operations on those
> iterators ar
On 03/28/2013 08:34 PM, Neil Hodgson wrote:
Steven D'Aprano:
Any string method that takes a starting offset requires the method to
walk the string byte-by-byte. I've even seen languages put responsibility
for dealing with that onto the programmer: the "start offset" is given in
*bytes*, not cha
MRAB:
Implementing the regex module (http://pypi.python.org/pypi/regex) would
have been more difficult if the internal representation had been UTF-8,
because of the need to decode, and the implementation would also have
been slower for that reason.
One way to build regex support for UTF-8 i
Steven D'Aprano:
Some string operations need to inspect every character, e.g. str.upper().
Even for them, the increased complexity of a variable-width encoding
costs. It's not sufficient to walk the string inspecting a fixed 1, 2 or
4 bytes per character. You have to walk the string grabbing 1 b
On Fri, Mar 29, 2013 at 1:37 PM, Steven D'Aprano
wrote:
> Under what circumstances will a string be created from a wchar_t string?
> How, and why, would such a string be created? Why would Python still
> support strings containing surrogates when it now has a nice, shiny,
> surrogate-free flexible
On Fri, 29 Mar 2013 11:54:41 +1100, Chris Angelico wrote:
> On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano
> wrote:
>> ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only
>> strings. It's only strings in the SMPs that could need surrogate pairs,
>> and they don't need them in
On 29/03/2013 00:54, Chris Angelico wrote:
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano
wrote:
ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only
strings. It's only strings in the SMPs that could need surrogate pairs,
and they don't need them in Python's implementation s
On Fri, Mar 29, 2013 at 12:03 PM, Mark Lawrence wrote:
> On 29/03/2013 00:54, Chris Angelico wrote:
>> Minor nitpick, btw:
>>>
>>> (in which cast wstr_length differs form length)
>>
>> Should be "in which case" and "from". Who has the power to correct
>> typos in PEPs?
>
> Sneak it in here? http:/
On 29/03/2013 00:54, Chris Angelico wrote:
Minor nitpick, btw:
(in which cast wstr_length differs form length)
Should be "in which case" and "from". Who has the power to correct
typos in PEPs?
ChrisA
Sneak it in here? http://bugs.python.org/issue13604
--
If you're using GoogleCrap™ please
On Fri, Mar 29, 2013 at 11:39 AM, Steven D'Aprano
wrote:
> ASCII and Latin-1 strings obviously do not have them. Nor do BMP-only
> strings. It's only strings in the SMPs that could need surrogate pairs,
> and they don't need them in Python's implementation since it's a full 32-
> bit implementatio
On Thu, 28 Mar 2013 12:54:20 -0700, rurpy wrote:
> Even if you personally would prefer someone to respond by calling you a
> liar, your personal preferences do not form a basis for desirable
> posting behavior here.
Whereas yours apparently are.
Thanks for the feedback, I'll take it under advise
On Thu, 28 Mar 2013 10:11:59 -0600, Ian Kelly wrote:
> On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico
> wrote:
>> PEP393 strings have two optimizations, or kinda three:
>>
>> 1a) ASCII-only strings
>> 1b) Latin1-only strings
>> 2) BMP-only strings
>> 3) Everything else
>>
>> Options 1a and 1b ar
On 28/03/2013 23:53, Dennis Lee Bieber wrote:
On Wed, 27 Mar 2013 23:12:21 -0700, Ethan Furman
declaimed the following in gmane.comp.python.general:
At some point we have to stop being gentle / polite / politically correct and
call a shovel a shovel... er, spade.
Call it an Instrum
On Fri, Mar 29, 2013 at 10:53 AM, Dennis Lee Bieber
wrote:
> On Wed, 27 Mar 2013 23:12:21 -0700, Ethan Furman
> declaimed the following in gmane.comp.python.general:
>
>>
>> At some point we have to stop being gentle / polite / politically correct
>> and call a shovel a shovel... er, spade.
>
>
On 3/28/2013 4:26 PM, jmfauth wrote:
Please provide references for your assertions. I have read the unicode
standard, parts more than once, and your assertions contradict my memory.
Unicode does not stipulate, one has to cover the whole range.
I believe it does. As I remember, the recognize
Chris Angelico於 2013年3月28日星期四UTC+8上午11時40分17秒寫道:
> On Thu, Mar 28, 2013 at 2:18 PM, Ethan Furman wrote:
>
> > Has anybody else thought that [jmf's] last few responses are starting to
> > sound
>
> > bot'ish?
>
>
>
> Yes, I did wonder. It's like he and Dihedral have been trading
>
> accounts
On Thu, Mar 28, 2013 at 2:11 PM, jmfauth wrote:
> On 28 mar, 21:29, Benjamin Kaplan wrote:
>> On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wrote:
>> > On 28 mar, 17:33, Ian Kelly wrote:
>> >> On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
>> >> > The flexible string representation takes the prob
On 28/03/2013 21:11, jmfauth wrote:
On 28 mar, 21:29, Benjamin Kaplan wrote:
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wrote:
> On 28 mar, 17:33, Ian Kelly wrote:
>> On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
>> > The flexible string representation takes the problem from the
>> > other
On Fri, Mar 29, 2013 at 7:26 AM, jmfauth wrote:
> The wide build (I never used) is in my mind as correct as
> the narrow build. It "just" covers a different range in unicode
> (the whole range).
Actually it does; it covers all of the Unicode range, by using
(effectively) UTF-16. Characters that c
On 28 mar, 22:11, jmfauth wrote:
> On 28 mar, 21:29, Benjamin Kaplan wrote:
>
>
>
>
>
>
>
>
>
> > On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wrote:
> > > On 28 mar, 17:33, Ian Kelly wrote:
> > >> On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
> > >> > The flexible string representation takes t
On 28 mar, 21:29, Benjamin Kaplan wrote:
> On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wrote:
> > On 28 mar, 17:33, Ian Kelly wrote:
> >> On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
> >> > The flexible string representation takes the problem from the
> >> > other side, it attempts to work wit
On 03/28/2013 12:54 PM, ru...@yahoo.com wrote:
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote:
For someone who delights in pointing out the logical errors
of others you are often remarkably sloppy in your own logic.
Of course language can be both
On Thu, Mar 28, 2013 at 10:48 AM, jmfauth wrote:
> On 28 mar, 17:33, Ian Kelly wrote:
>> On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
>> > The flexible string representation takes the problem from the
>> > other side, it attempts to work with the characters by using
>> > their representations
On 28 mar, 18:55, Chris Angelico wrote:
> On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wrote:
> > If Python had imlemented Unicode correctly, there would
> > be no difference in using an "a", "é", "€" or any character,
> > what the narrow builds did.
>
> I'm not following your grammar perfectly here,
In article
,
Chris Angelico wrote:
> I'd rather this list have some vinegar than it devolve into
> uselessness. Or, worse, if there's a hard-and-fast rule about
> courtesy, devolve into aspartame... everyone's courteous in words but
> hates each other underneath. Or am I taking the analogy too f
On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
> On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote:
>> More seriously Ive never seen anyone -- cause or person -- aided by
>> the use of excessively strong language.
>
> Of course not. By definition, if it helps, it wasn't *excessively* strong
> langua
On Fri, Mar 29, 2013 at 4:48 AM, jmfauth wrote:
> If Python had imlemented Unicode correctly, there would
> be no difference in using an "a", "é", "€" or any character,
> what the narrow builds did.
I'm not following your grammar perfectly here, but if Python were
implementing Unicode correctly,
On 28 mar, 17:33, Ian Kelly wrote:
> On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
> > The flexible string representation takes the problem from the
> > other side, it attempts to work with the characters by using
> > their representations and it (can only) fails...
>
> This is false. As I've
On Fri, Mar 29, 2013 at 3:55 AM, jmfauth wrote:
> Assume you have a set of integers {0...9} and an operator,
> let say, the addition.
>
> Idea.
> Just devide this set in two chunks, {0...4} and {5...9}
> and work hardly to optimize the addition of 2 operands in
> the sets {0...4}.
>
> The problems
Chris,
Your problem with int/long, the start of this thread, is
very intersting.
This is not a demonstration, a proof, rather an illustration.
Assume you have a set of integers {0...9} and an operator,
let say, the addition.
Idea.
Just devide this set in two chunks, {0...4} and {5...9}
and work
On Thu, Mar 28, 2013 at 7:34 AM, jmfauth wrote:
> The flexible string representation takes the problem from the
> other side, it attempts to work with the characters by using
> their representations and it (can only) fails...
This is false. As I've pointed out to you before, the FSR does not
div
On Fri, Mar 29, 2013 at 3:01 AM, Terry Reedy wrote:
> On 3/28/2013 10:38 AM, Chris Angelico wrote:
>
>> PEP393 strings have two optimizations, or kinda three:
>>
>> 1a) ASCII-only strings
>> 1b) Latin1-only strings
>> 2) BMP-only strings
>> 3) Everything else
>>
>> Options 1a and 1b are almost ide
On Thu, Mar 28, 2013 at 8:38 AM, Chris Angelico wrote:
> PEP393 strings have two optimizations, or kinda three:
>
> 1a) ASCII-only strings
> 1b) Latin1-only strings
> 2) BMP-only strings
> 3) Everything else
>
> Options 1a and 1b are almost identical - I'm not sure what the detail
> is, but there'
On 3/28/2013 10:38 AM, Chris Angelico wrote:
PEP393 strings have two optimizations, or kinda three:
1a) ASCII-only strings
1b) Latin1-only strings
2) BMP-only strings
3) Everything else
Options 1a and 1b are almost identical - I'm not sure what the detail
is, but there's something flagging tho
On Thu, Mar 28, 2013 at 7:01 AM, Steven D'Aprano
wrote:
> Any string method that takes a starting offset requires the method to
> walk the string byte-by-byte. I've even seen languages put responsibility
> for dealing with that onto the programmer: the "start offset" is given in
> *bytes*, not cha
On 28 mar, 16:14, jmfauth wrote:
> On 28 mar, 15:38, Chris Angelico wrote:
>
>
>
>
>
>
>
>
>
> > On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wrote:
> > > This flexible string representation is so absurd that not only
> > > "it" does not know you can not write Western European Languages
> > > with l
On Fri, Mar 29, 2013 at 2:14 AM, jmfauth wrote:
> As long as you are attempting to devide a set of characters in
> chunks and try to handle them seperately, it will never work.
Okay. Let's look at integers. To properly represent the Python 3 'int'
type (or the Python 2 'long'), we need to be able
On 28 mar, 15:38, Chris Angelico wrote:
> On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wrote:
> > This flexible string representation is so absurd that not only
> > "it" does not know you can not write Western European Languages
> > with latin-1, "it" penalizes you by just attempting to optimize
> >
On Fri, Mar 29, 2013 at 1:51 AM, MRAB wrote:
> On 28/03/2013 12:11, Neil Hodgson wrote:
>>
>> Ian Foote:
>>
>>> Specifically, indexing a variable-length encoding like utf-8 is not
>>> as efficient as indexing a fixed-length encoding.
>>
>>
>> Many common string operations do not require indexing b
On 28/03/2013 12:11, Neil Hodgson wrote:
Ian Foote:
Specifically, indexing a variable-length encoding like utf-8 is not
as efficient as indexing a fixed-length encoding.
Many common string operations do not require indexing by character
which reduces the impact of this inefficiency. UTF-8 see
On Fri, Mar 29, 2013 at 1:12 AM, jmfauth wrote:
> This flexible string representation is so absurd that not only
> "it" does not know you can not write Western European Languages
> with latin-1, "it" penalizes you by just attempting to optimize
> latin-1. Shown in my multiple examples.
PEP393 str
On 28 mar, 14:01, Steven D'Aprano wrote:
> On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote:
> > Ian Foote:
>
>
> > One benefit of
> > UTF-8 over Python's flexible representation is that it is, on average,
> > more compact over a wide set of samples.
>
> Sure. And over a different set of sam
On 28 mar, 11:30, Chris Angelico wrote:
> On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wrote:
-
> You really REALLY need to sort out in your head the difference between
> correctness and performance. I still haven't seen one single piece of
> evidence from you that Python 3.3 fails on any point
On Thu, 28 Mar 2013 23:11:55 +1100, Neil Hodgson wrote:
> Ian Foote:
>
>> Specifically, indexing a variable-length encoding like utf-8 is not as
>> efficient as indexing a fixed-length encoding.
>
> Many common string operations do not require indexing by character
> which reduces the impact
On 28/03/2013 03:18, Ethan Furman wrote:
I wouldn't call it unproductive -- a half-dozen amusing posts followed
because of Mark's initial post, and they were a great relief from the
tedium and (dare I say it?) idiocy of jmf's posts.
--
~Ethan~
Thanks for those words. They're a tonic as I've
Ian Foote:
Specifically, indexing a variable-length encoding like utf-8 is not as
efficient as indexing a fixed-length encoding.
Many common string operations do not require indexing by character
which reduces the impact of this inefficiency. UTF-8 seems like a
reasonable choice for an in
On Thu, Mar 28, 2013 at 8:03 PM, jmfauth wrote:
> Example of a good Unicode understanding.
> If you wish 1) to preserve memory, 2) to cover the whole range
> of Unicode, 3) to keep maximum performance while preserving the
> good work Unicode.org as done (normalization, sorting), there
> is only on
On Thu, Mar 28, 2013 at 4:20 PM, Steven D'Aprano
wrote:
> On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote:
>
>> In particular "You are a liar" is as bad as "You are an idiot" The same
>> statement can be made non-abusively thus: "... is not true because ..."
>
> I accept that criticism, even if I d
On 28 March 2013 09:03, jmfauth wrote:
>
> The problem is elsewhere. Nobody understand the examples
> I gave on this list, because nobody understand Unicode.
> These examples are not random examples, they are well
> thought.
There are many people here and among the Python devs who understand
unic
On 28/03/13 09:03, jmfauth wrote:
The problem is elsewhere. Nobody understand the examples
I gave on this list, because nobody understand Unicode.
These examples are not random examples, they are well
thought.
If you were understanding the coding of the characters,
Unicode and what this flexible
On 28 mar, 07:12, Ethan Furman wrote:
> On 03/27/2013 08:49 PM, rusi wrote:
>
> > In particular "You are a liar" is as bad as "You are an idiot"
> > The same statement can be made non-abusively thus: "... is not true
> > because ..."
>
> I don't agree. With all the posts and micro benchmarks and
On Wed, 27 Mar 2013 22:42:18 -0700, rusi wrote:
> More seriously Ive never seen anyone -- cause or person -- aided by
> the use of excessively strong language.
Of course not. By definition, if it helps, it wasn't *excessively* strong
language.
> IOW I repeat my support for Ned's request: Ad h
On 03/27/2013 08:49 PM, rusi wrote:
In particular "You are a liar" is as bad as "You are an idiot"
The same statement can be made non-abusively thus: "... is not true
because ..."
I don't agree. With all the posts and micro benchmarks and other drivel that jmf has inflicted on us, I find it /v
On Mar 28, 10:20 am, Steven D'Aprano wrote:
> On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote:
> > On Mar 28, 8:18 am, Ethan Furman wrote:
>
> >> So long as Mark doesn't start cussing and swearing I'm not going to get
> >> worked up about it. I find jmf's posts for more aggravating.
>
> > I suppo
On Wed, 27 Mar 2013 20:49:20 -0700, rusi wrote:
> On Mar 28, 8:18 am, Ethan Furman wrote:
>>
>> So long as Mark doesn't start cussing and swearing I'm not going to get
>> worked up about it. I find jmf's posts for more aggravating.
>
> I support Ned's original gentle reminder -- Please be civil
On Mar 28, 8:18 am, Ethan Furman wrote:
>
> So long as Mark doesn't start cussing and swearing I'm not going to get
> worked up about it. I
> find jmf's posts for more aggravating.
I support Ned's original gentle reminder -- Please be civil
irrespective of surrounding nonsensical behavior.
In
On Thu, Mar 28, 2013 at 2:18 PM, Ethan Furman wrote:
> Has anybody else thought that [jmf's] last few responses are starting to sound
> bot'ish?
Yes, I did wonder. It's like he and Dihedral have been trading
accounts sometimes. Hey, Dihedral, I hear there's a discussion of
Unicode and PEP 393 and
On 03/27/2013 06:47 PM, Steven D'Aprano wrote:
On Wed, 27 Mar 2013 11:51:07 +, Mark Lawrence defending an
unproductive post flaming a troll:
I wouldn't call it unproductive -- a half-dozen amusing posts followed because of Mark's initial post, and they were a
great relief from the tedium a
70 matches
Mail list logo