Re: Unicode 7

2014-05-02 Thread Terry Reedy
On 5/2/2014 9:15 PM, Chris Angelico wrote: (My reading of PEP 3131 is that NFKC is used; is that what's implemented, or was that a temporary measure and/or something for Py2 to consider?) The 3.4 docs say "The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with

Re: Unicode 7

2014-05-02 Thread Chris Angelico
On Sat, May 3, 2014 at 12:02 PM, Steven D'Aprano wrote: > If you know your victim is reading source code in Ariel font, "rn" and > "m" are virtually indistinguishable except at very large sizes. I kinda like the idea of naming it after a bratty teenager who rebels against her father and runs away

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano
On Sat, 03 May 2014 02:02:32 +, Steven D'Aprano wrote: > On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote: > >> I am confused about the tone however: You think this >> > (fine, fine) = (1,2) # and no issue about it >> >> is fine? > > > It's no worse than any other obfuscated varia

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano
On Fri, 02 May 2014 17:58:51 -0700, Rustom Mody wrote: > I am confused about the tone however: You think this > (fine, fine) = (1,2) # and no issue about it > > is fine? It's no worse than any other obfuscated variable name: MOOSE, MO0SE, M0OSE = 1, 2, 3 xl, x1 = 1, 2 If you know your vi

Re: Unicode 7

2014-05-02 Thread Rustom Mody
On Saturday, May 3, 2014 7:24:08 AM UTC+5:30, Chris Angelico wrote: > On Sat, May 3, 2014 at 11:42 AM, Rustom Mody wrote: > > Two identifiers that to some programmers > > - can look the same > > - and not to others > > - and that the language treats as different > > is not fine (or fine) to me. > T

Re: Unicode 7

2014-05-02 Thread Chris Angelico
On Sat, May 3, 2014 at 11:42 AM, Rustom Mody wrote: > Two identifiers that to some programmers > - can look the same > - and not to others > - and that the language treats as different > > is not fine (or fine) to me. The language treats them as the same, though. ChrisA -- https://mail.python.or

Re: Unicode 7

2014-05-02 Thread Rustom Mody
On Saturday, May 3, 2014 6:48:21 AM UTC+5:30, Ned Batchelder wrote: > On 5/2/14 8:58 PM, Rustom Mody wrote: > > On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote: > >> Rustom Mody wrote: > >>> Just noticed a small thing in which python does a bit better than haskell: > >>> $ ghci > >>>

Re: Unicode 7

2014-05-02 Thread Ned Batchelder
On 5/2/14 8:58 PM, Rustom Mody wrote: On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote: Rustom Mody wrote: Just noticed a small thing in which python does a bit better than haskell: $ ghci let (fine, fine) = (1,2) Prelude> (fine, fine) (1,2) In case its not apparent, the fi in the

Re: Unicode 7

2014-05-02 Thread Chris Angelico
On Sat, May 3, 2014 at 10:58 AM, Rustom Mody wrote: > You think this > (fine, fine) = (1,2) # and no issue about it > > is fine? Not sure which part you're objecting to. Are you saying that this should be an error: >>> a, a = 1, 2 # simple ASCII identifier used twice or that Python should t

Re: Unicode 7

2014-05-02 Thread Rustom Mody
On Friday, May 2, 2014 11:37:02 PM UTC+5:30, Peter Otten wrote: > Rustom Mody wrote: > > Just noticed a small thing in which python does a bit better than haskell: > > $ ghci > > let (fine, fine) = (1,2) > > Prelude> (fine, fine) > > (1,2) > > In case its not apparent, the fi in the first fine is a

Re: Unicode 7

2014-05-02 Thread Roy Smith
In article , Ben Finney wrote: > The non-breaking space (“ ” U+00A0) is frequently used in text to keep > conceptually inseparable text such as “100 km” from automatic word > breaks https://en.wikipedia.org/wiki/Non-breaking_space>. Which, by the way, argparse doesn't honor... http:/

Re: Unicode 7

2014-05-02 Thread Ben Finney
Marko Rauhamaa writes: > That reminds me: " " [U+00A0 NON-BREAKING SPACE] is often used between > numbers and units, for example. The non-breaking space (“ ” U+00A0) is frequently used in text to keep conceptually inseparable text such as “100 km” from automatic word breaks https://en.wikipedia.

Re: Unicode 7

2014-05-02 Thread Peter Otten
Rustom Mody wrote: > Just noticed a small thing in which python does a bit better than haskell: > $ ghci > let (fine, fine) = (1,2) > Prelude> (fine, fine) > (1,2) > Prelude> > > In case its not apparent, the fi in the first fine is a ligature. > > Python just barfs: Not Python 3: Python 3.3.2+

Re: Unicode 7

2014-05-02 Thread Ned Batchelder
On 5/2/14 12:50 PM, Rustom Mody wrote: Just noticed a small thing in which python does a bit better than haskell: $ ghci let (fine, fine) = (1,2) Prelude> (fine, fine) (1,2) Prelude> In case its not apparent, the fi in the first fine is a ligature. Python just barfs: >>>fine = 1 File "", lin

Re: Unicode 7

2014-05-02 Thread Michael Torrie
On 05/02/2014 10:50 AM, Rustom Mody wrote: > Python just barfs: > fine = 1 > File "", line 1 > fine = 1 > ^ > SyntaxError: invalid syntax > > The point of that example is to show that unicode gives all kind of > "Aaah! Gotcha!!" opportunities that just dont exist in the old wor

Re: Unicode 7

2014-05-02 Thread MRAB
On 2014-05-02 09:08, Steven D'Aprano wrote: On Thu, 01 May 2014 21:42:21 -0700, Rustom Mody wrote: Whats the best cure for headache? Cut off the head o_O I don't think so. Whats the best cure for Unicode? Ascii Unicode is not a problem to be solved. The inability to write standard h

Re: Unicode 7

2014-05-02 Thread Rustom Mody
On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote: > On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote: > > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote: > >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: > >> > - Worst of all what we > >> > *dont*

Re: Unicode 7

2014-05-02 Thread MRAB
On 2014-05-02 03:39, Ben Finney wrote: Rustom Mody writes: Yes, the headaches go a little further back than Unicode. Okay, so can you change your article to reflect the fact that the headaches both pre-date Unicode, and are made much easier by Unicode? There is a certain large old book...

Re: Unicode 7

2014-05-02 Thread Rustom Mody
On Friday, May 2, 2014 5:25:37 PM UTC+5:30, Steven D'Aprano wrote: > On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote: > > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote: > >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: > >> > - Worst of all what we > >> > *dont*

Re: Unicode 7

2014-05-02 Thread Tim Chase
On 2014-05-02 19:08, Chris Angelico wrote: > This is another area where Unicode has given us "a great improvement > over the old method of giving satisfaction". Back in the 1990s on > OS/2, DOS, and Windows, a missing glyph might be (a) blank, (b) a > simple square with no information, or (c) copie

Re: Unicode 7

2014-05-02 Thread Marko Rauhamaa
Steven D'Aprano : > And you've never been bitten by an invisible control character in > ASCII text? You've lived a sheltered life! That reminds me: " " (nonbreakable space) is often used between numbers and units, for example. Marko -- https://mail.python.org/mailman/listinfo/python-list

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano
On Fri, 02 May 2014 03:39:34 -0700, Rustom Mody wrote: > On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote: >> On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: >> > - Worst of all what we >> > *dont* see -- how many others dont see what we see? > >> Again, this a deficiency

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano
On Fri, 02 May 2014 19:01:44 +1000, Chris Angelico wrote: > On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano > wrote: >> ... even *Americans* cannot represent all their common characters in >> ASCII, let alone specialised characters from mathematics, science, the >> printing industry, and law. >

Re: Unicode 7

2014-05-02 Thread Rustom Mody
On Friday, May 2, 2014 2:15:41 PM UTC+5:30, Steven D'Aprano wrote: > On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: > > - Worst of all what we > > *dont* see -- how many others dont see what we see? > Again, this a deficiency of the font. There are very few code points in > Unicode which

Re: Unicode 7

2014-05-02 Thread Marko Rauhamaa
Ben Finney : >> Aside: What additional characters does law use that aren't in ASCII? >> Section § and paragraph ¶ are used frequently, but you already >> mentioned the printing industry. Are there other symbols? > > ASCII does not contain “©” (U+00A9 COPYRIGHT SIGN) nor “®” (U+00AE > REGISTERED SI

Re: Unicode 7

2014-05-02 Thread Jussi Piitulainen
Chris Angelico writes: > (common with dingbats fonts). With Unicode, the standard is to show > a little box *with the hex digits in it*. Granted, those boxes are a > LOT more readable for BMP characters than SMP (unless your text is > huge, six digits in the space of one character will make them p

Re: Unicode 7

2014-05-02 Thread Chris Angelico
On Fri, May 2, 2014 at 7:16 PM, Ben Finney wrote: > Chris Angelico writes: > >> On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano >> wrote: >> > ... even *Americans* cannot represent all their common characters in >> > ASCII, let alone specialised characters from mathematics, science, >> > the pri

Re: Unicode 7

2014-05-02 Thread Ben Finney
Chris Angelico writes: > On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano > wrote: > > ... even *Americans* cannot represent all their common characters in > > ASCII, let alone specialised characters from mathematics, science, > > the printing industry, and law. > > Aside: What additional charact

Re: Unicode 7

2014-05-02 Thread Chris Angelico
On Fri, May 2, 2014 at 6:45 PM, Steven D'Aprano wrote: >> - unicode 'number-boxes' (what are these called?) > > They are missing character glyphs, and they have nothing to do with > Unicode. They are due to deficiencies in the text font you are using. > > Admittedly with Unicode's 0x10 possibl

Re: Unicode 7

2014-05-02 Thread Chris Angelico
On Fri, May 2, 2014 at 6:08 PM, Steven D'Aprano wrote: > ... even *Americans* cannot represent all their common characters in > ASCII, let alone specialised characters from mathematics, science, the > printing industry, and law. Aside: What additional characters does law use that aren't in ASCII?

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano
On Thu, 01 May 2014 19:02:48 -0700, Rustom Mody wrote: > I dont know how one causally connects the 'headaches' but Ive seen - > mojibake Mojibake is certainly more common with multiple encodings, but the solution to that is Unicode, not ASCII. In fact, in your blog post you even link to a post

Re: Unicode 7

2014-05-02 Thread Steven D'Aprano
On Thu, 01 May 2014 21:42:21 -0700, Rustom Mody wrote: > Whats the best cure for headache? > > Cut off the head o_O I don't think so. > Whats the best cure for Unicode? > > Ascii Unicode is not a problem to be solved. The inability to write standard human text in ASCII is a problem, e.g.

Re: Unicode 7

2014-05-01 Thread Terry Reedy
On 5/1/2014 10:29 PM, Rustom Mody wrote: Here is an instance of someone who would like a certain optimization to be dis-able-able https://mail.python.org/pipermail/python-list/2014-February/667169.html To the best of my knowledge its nothing to do with unicode or with jmf. Right. Ned has an

Re: Unicode 7

2014-05-01 Thread Chris Angelico
On Fri, May 2, 2014 at 2:42 PM, Rustom Mody wrote: > Unicode consortium's going from old BMP to current (6.0) SMPs to > who-knows-what > in the future is similar. Unicode 1.0: "Let's make a single universal character set that can represent all the world's scripts. We'll define 65536 codepoints t

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Friday, May 2, 2014 9:46:36 AM UTC+5:30, Terry Reedy wrote: > On 5/1/2014 7:33 PM, MRAB wrote: > > On 2014-05-01 23:38, Terry Reedy wrote: > >> On 5/1/2014 2:04 PM, Rustom Mody wrote: > > Since its Unicode-troll time, here's my contribution > > http://blog.languager.org/2014/04/unicode-a

Re: Unicode 7

2014-05-01 Thread Terry Reedy
On 5/1/2014 7:33 PM, MRAB wrote: On 2014-05-01 23:38, Terry Reedy wrote: On 5/1/2014 2:04 PM, Rustom Mody wrote: Since its Unicode-troll time, here's my contribution http://blog.languager.org/2014/04/unicode-and-unix-assumption.html I will not comment on the Unix-assumption part, but I think

Re: Unicode 7

2014-05-01 Thread Steven D'Aprano
On Thu, 01 May 2014 18:38:35 -0400, Terry Reedy wrote: > "strange beasties like python's FSR" > > Have you really let yourself be poisoned by JMF's bizarre rants? The FSR > is an *internal optimization* that benefits most unicode operations that > people actually perform. It uses UTF-32 by defaul

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Friday, May 2, 2014 8:31:56 AM UTC+5:30, Chris Angelico wrote: > On Fri, May 2, 2014 at 12:29 PM, Rustom Mody wrote: > > Here is an instance of someone who would like a certain optimization to be > > dis-able-able > > https://mail.python.org/pipermail/python-list/2014-February/667169.html > > To

Re: Unicode 7

2014-05-01 Thread Chris Angelico
On Fri, May 2, 2014 at 12:29 PM, Rustom Mody wrote: > Here is an instance of someone who would like a certain optimization to be > dis-able-able > > https://mail.python.org/pipermail/python-list/2014-February/667169.html > > To the best of my knowledge its nothing to do with unicode or with jmf.

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Friday, May 2, 2014 8:09:44 AM UTC+5:30, Ben Finney wrote: > Rustom Mody writes: > > Yes, the headaches go a little further back than Unicode. > Okay, so can you change your article to reflect the fact that the > headaches both pre-date Unicode, and are made much easier by Unicode? Predate:

Re: Unicode 7

2014-05-01 Thread Ben Finney
Rustom Mody writes: > Yes, the headaches go a little further back than Unicode. Okay, so can you change your article to reflect the fact that the headaches both pre-date Unicode, and are made much easier by Unicode? > There is a certain large old book... Ah yes, the neo-Sumerian story “Enmerka

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Friday, May 2, 2014 7:59:55 AM UTC+5:30, Rustom Mody wrote: > "Why should I pay more for a EURO sign than a $ sign?" A unicode 'headache' there: I typed the Euro sign (trying again € ) not EURO Somebody -- I guess its GG in overhelpful mode -- converted it And made my post: Content-Type: text

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Friday, May 2, 2014 4:08:35 AM UTC+5:30, Terry Reedy wrote: > On 5/1/2014 2:04 PM, Rustom Mody wrote: > >>> Since its Unicode-troll time, here's my contribution > >>> http://blog.languager.org/2014/04/unicode-and-unix-assumption.html > I will not comment on the Unix-assumption part, but I thin

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Friday, May 2, 2014 5:03:21 AM UTC+5:30, MRAB wrote: > On 2014-05-01 23:38, Terry Reedy wrote: > > On 5/1/2014 2:04 PM, Rustom Mody wrote: > Since its Unicode-troll time, here's my contribution > http://blog.languager.org/2014/04/unicode-and-unix-assumption.html > > I will not comment

Re: Unicode 7

2014-05-01 Thread MRAB
On 2014-05-01 23:38, Terry Reedy wrote: On 5/1/2014 2:04 PM, Rustom Mody wrote: Since its Unicode-troll time, here's my contribution http://blog.languager.org/2014/04/unicode-and-unix-assumption.html I will not comment on the Unix-assumption part, but I think you go wrong with this: "Unicode

Re: Unicode 7

2014-05-01 Thread Terry Reedy
On 5/1/2014 2:04 PM, Rustom Mody wrote: Since its Unicode-troll time, here's my contribution http://blog.languager.org/2014/04/unicode-and-unix-assumption.html I will not comment on the Unix-assumption part, but I think you go wrong with this: "Unicode is a Headache". The major headache is t

Re: Unicode 7

2014-05-01 Thread Rustom Mody
On Thursday, May 1, 2014 10:30:43 AM UTC+5:30, Steven D'Aprano wrote: > On Tue, 29 Apr 2014 21:53:22 -0700, Rustom Mody wrote: > > On Tuesday, April 29, 2014 11:29:23 PM UTC+5:30, Tim Chase wrote: > >> While I dislike feeding the troll, what I see here is: > > Since its Unicode-troll time, here's

Re: Unicode 7

2014-04-30 Thread wxjmfauth
Le mercredi 30 avril 2014 20:48:48 UTC+2, Tim Chase a écrit : > On 2014-04-30 00:06, wxjmfa...@gmail.com wrote: > > > @ Time Chase > > > > > > I'm perfectly aware about what I'm doing. > > > > Apparently, you're quite adept at appending superfluous characters to > > sensible strings...did y

Re: Unicode 7

2014-04-30 Thread Steven D'Aprano
On Tue, 29 Apr 2014 21:53:22 -0700, Rustom Mody wrote: > On Tuesday, April 29, 2014 11:29:23 PM UTC+5:30, Tim Chase wrote: >> While I dislike feeding the troll, what I see here is: > > > > Since its Unicode-troll time, here's my contribution > http://blog.languager.org/2014/04/unicode-and-unix-

Re: Unicode 7

2014-04-30 Thread Tim Chase
On 2014-04-30 00:06, wxjmfa...@gmail.com wrote: > @ Time Chase > > I'm perfectly aware about what I'm doing. Apparently, you're quite adept at appending superfluous characters to sensible strings...did you benchmark your email composition, too? ;-) -tkc (aka "Tim", not "Time") -- https://ma

Re: Unicode 7

2014-04-30 Thread wxjmfauth
@ Time Chase I'm perfectly aware about what I'm doing. @ MRAB "...Although the third example is the fastest, it's also the wrong way to handle Unicode: ..." Maybe that's exactly the opposite. It illustrates very well, the quality of coding schemes endorsed by Unicode.org. I deliberately choose

Re: Unicode 7

2014-04-29 Thread Rustom Mody
On Tuesday, April 29, 2014 11:29:23 PM UTC+5:30, Tim Chase wrote: > While I dislike feeding the troll, what I see here is: Since its Unicode-troll time, here's my contribution http://blog.languager.org/2014/04/unicode-and-unix-assumption.html :-) More seriously, since Ive quoted some esteemed

Re: Unicode 7

2014-04-29 Thread MRAB
On 2014-04-29 18:37, wxjmfa...@gmail.com wrote: Let see how Python is ready for the next Unicode version (Unicode 7.0.0.Beta). timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'") [1.4027834829454946, 1.38714224331963, 1.3822586635296261] timeit.repeat("(x*1000 + y)[:-1]", setup="x

Re: Unicode 7

2014-04-29 Thread Tim Chase
On 2014-04-29 10:37, wxjmfa...@gmail.com wrote: > >>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = 'z'") > [1.4027834829454946, 1.38714224331963, 1.3822586635296261] > >>> timeit.repeat("(x*1000 + y)[:-1]", setup="x = 'abc'; y = > >>> '\u0fce'") > [5.462776291480395, 5.4479432055423