Re: Python Unicode handling wins again -- mostly

2013-12-04 Thread Neil Cerutti
On 2013-12-04, wxjmfa...@gmail.com wrote: > Yon intuitively pointed a very important feature of "unicode". > However, it is not necessary, this is exactly what unicode does > (when used properly). Unicode only provides character sets. It's not a natural language parsing facility. -- Neil Cerutt

Re: Python Unicode handling wins again -- mostly

2013-12-04 Thread Mark Lawrence
On 04/12/2013 13:52, wxjmfa...@gmail.com wrote: [snip all the double spaced stuff] Yon intuitively pointed a very important feature of "unicode". However, it is not necessary, this is exactly what unicode does (when used properly). jmf Presumably using unicode correctly prevents messages b

Re: Python Unicode handling wins again -- mostly

2013-12-04 Thread wxjmfauth
Le mardi 3 décembre 2013 15:26:45 UTC+1, Ethan Furman a écrit : > On 12/02/2013 12:38 PM, Ethan Furman wrote: > > > On 11/29/2013 04:44 PM, Steven D'Aprano wrote: > > >> > > >> Out of the nine tests, Python 3.3 passes six, with three tests being > > >> failures or dubious. If you believe that t

Re: Python Unicode handling wins again -- mostly

2013-12-03 Thread wxjmfauth
Le mardi 3 décembre 2013 06:06:26 UTC+1, Steven D'Aprano a écrit : > On Mon, 02 Dec 2013 16:14:13 -0500, Ned Batchelder wrote: > > > > > On 12/2/13 3:38 PM, Ethan Furman wrote: > > >> On 11/29/2013 04:44 PM, Steven D'Aprano wrote: > > >>> > > >>> Out of the nine tests, Python 3.3 passes six,

Re: Python Unicode handling wins again -- mostly

2013-12-03 Thread Ethan Furman
On 12/02/2013 12:38 PM, Ethan Furman wrote: On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points, then you'll think that Python does the r

Re: Python Unicode handling wins again -- mostly

2013-12-03 Thread Neil Cerutti
On 2013-12-02, Ethan Furman wrote: > On 11/29/2013 04:44 PM, Steven D'Aprano wrote: >> Out of the nine tests, Python 3.3 passes six, with three tests >> being failures or dubious. If you believe that the native >> string type should operate on code-points, then you'll think >> that Python does the

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-03 Thread Mark Lawrence
On 03/12/2013 01:38, Roy Smith wrote: In article , Mark Lawrence wrote: My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. "I believe that Pythonistas should commit themselves to achieving the goal, before this decade is out, of making Py

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-03 Thread Mark Lawrence
On 03/12/2013 04:32, Grant Edwards wrote: On 2013-12-03, Roy Smith wrote: "I believe that Pythonistas should commit themselves to achieving the goal, before this decade is out, of making Python 3 the default version and having everybody be cool with unicode." I'm cool with Unicode as long as

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread joe
How would a grapheme library work? Basic cluster combination, or would implementing other algorithms (line break, normalizing to a "canonical" form) be necessary? How do people use grapheme clusters in non-rendering situations? Or here's perhaps here's a better question: does anyone know any non-l

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Steven D'Aprano
On Tue, 03 Dec 2013 04:32:13 +, Grant Edwards wrote: > On 2013-12-03, Roy Smith wrote: > >> "I believe that Pythonistas should commit themselves to achieving the >> goal, before this decade is out, of making Python 3 the default version >> and having everybody be cool with unicode." > > I'm

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Steven D'Aprano
On Mon, 02 Dec 2013 16:14:13 -0500, Ned Batchelder wrote: > On 12/2/13 3:38 PM, Ethan Furman wrote: >> On 11/29/2013 04:44 PM, Steven D'Aprano wrote: >>> >>> Out of the nine tests, Python 3.3 passes six, with three tests being >>> failures or dubious. If you believe that the native string type sho

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Ethan Furman
On 12/02/2013 07:22 PM, Terry Reedy wrote: On 12/2/2013 4:25 PM, Ethan Furman wrote: jmf is certainly a troll No, he is a person who discovered a minor performance regression in the FSR, which we fixed. Unfortunately, he then continued for a year with a strange troll-like anti-FSR crusade. Bu

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Grant Edwards
On 2013-12-03, Roy Smith wrote: > "I believe that Pythonistas should commit themselves to achieving the > goal, before this decade is out, of making Python 3 the default version > and having everybody be cool with unicode." I'm cool with Unicode as long as it "just works" without me ever havin

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Terry Reedy
On 12/2/2013 4:25 PM, Ethan Furman wrote: jmf is certainly a troll No, he is a person who discovered a minor performance regression in the FSR, which we fixed. Unfortunately, he then continued for a year with a strange troll-like anti-FSR crusade. But his posts in the Unicode handling thread

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Roy Smith
In article , Mark Lawrence wrote: > My fellow Pythonistas, ask not what our language can do for you, ask > what you can do for our language. "I believe that Pythonistas should commit themselves to achieving the goal, before this decade is out, of making Python 3 the default version and havin

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ethan Furman
On 12/02/2013 02:32 PM, Mark Lawrence wrote: ... the other being a pot smoking hippy who ... Please trim your posts. You comment a lot on people sending double-spaced google posts -- not trimming is nearly as bad. The above is a good example of unnecessary name calling. I value your good p

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ben Finney
Ned Batchelder writes: > This is where my knowledge about Unicode gets fuzzy. Isn't it the > case that some grapheme clusters (or whatever the right word is) can't > be normalized down to a single code point? Characters can accept many > accents, for example. That's true, but doesn't affect th

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 5:32 PM, Mark Lawrence wrote: On 02/12/2013 22:24, Ned Batchelder wrote: On 12/2/13 4:44 PM, Ned Batchelder wrote: On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I c

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 22:24, Ned Batchelder wrote: On 12/2/13 4:44 PM, Ned Batchelder wrote: On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal att

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 4:44 PM, Ned Batchelder wrote: On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of t

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Ned Batchelder
On 12/2/13 4:25 PM, Ethan Furman wrote: On 12/02/2013 12:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of

Re: Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Mark Lawrence
On 02/12/2013 21:25, Ethan Furman wrote: On 12/02/2013 12:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation o

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ethan Furman
On 12/02/2013 01:23 PM, Chris Angelico wrote: On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder wrote: This is where my knowledge about Unicode gets fuzzy. Isn't it the case that some grapheme clusters (or whatever the right word is) can't be normalized down to a single code point? Characters ca

Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

2013-12-02 Thread Ethan Furman
On 12/02/2013 12:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* ap

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 3:45 PM, Mark Lawrence wrote: On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* apply

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread MRAB
On 02/12/2013 21:14, Ned Batchelder wrote: On 12/2/13 3:38 PM, Ethan Furman wrote: On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points,

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Chris Angelico
On Tue, Dec 3, 2013 at 8:14 AM, Ned Batchelder wrote: > This is where my knowledge about Unicode gets fuzzy. Isn't it the case that > some grapheme clusters (or whatever the right word is) can't be normalized > down to a single code point? Characters can accept many accents, for > example. You

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 3:38 PM, Ethan Furman wrote: On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points, then you'll think that Python does the right

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ethan Furman
On 11/29/2013 04:44 PM, Steven D'Aprano wrote: Out of the nine tests, Python 3.3 passes six, with three tests being failures or dubious. If you believe that the native string type should operate on code-points, then you'll think that Python does the right thing. I think Python is doing it corr

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 20:26, Terry Reedy wrote: On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* apply to python-list. Please stop. The attack

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Terry Reedy
On 12/2/2013 10:45 AM, Mark Lawrence wrote: the worst loser in the world Mark, I consider your continual direct personal attacks on other posters to be a violation of the PSF Code of Conduct, which *does* apply to python-list. Please stop. -- Terry Jan Reedy, one of multiple list moderator

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 10:45 AM, Mark Lawrence wrote: On 02/12/2013 15:22, Ned Batchelder wrote: On 12/2/13 9:46 AM, Mark Lawrence wrote: On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Chris Angelico
On Tue, Dec 3, 2013 at 2:45 AM, Mark Lawrence wrote: > He's quite deliberately dragged it up by using p.s. Without doubt he's the > worst loser in the world and I'm *NOT* stopping getting at him. I find his > behaviour, continuously and groundlessly insulting the Python core > developers, quite

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 15:22, Ned Batchelder wrote: On 12/2/13 9:46 AM, Mark Lawrence wrote: On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your English is far from perfect as you clearly

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Ned Batchelder
On 12/2/13 9:46 AM, Mark Lawrence wrote: On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your English is far from perfect as you clearly do not understand the repeated requests *NO

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread Mark Lawrence
On 02/12/2013 12:39, wxjmfa...@gmail.com wrote: My English is far too be perfect, I think I understood it correctly. PS I did not even speak about the FSR. 1) Your English is far from perfect as you clearly do not understand the repeated requests *NOT* to send us double spaced crap via goog

Re: Python Unicode handling wins again -- mostly

2013-12-02 Thread wxjmfauth
Le dimanche 1 décembre 2013 21:54:48 UTC+1, Tim Delaney a écrit : > On 2 December 2013 07:15, wrote: > > > 0.11.13 02:44, Steven D'Aprano написав(ла): > > > > (2) If you reverse that string, does it give "lëon"? The implication of > > > this question is that strings should operate on graphem

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Mark Lawrence
On 01/12/2013 22:50, Ethan Furman wrote: On 12/01/2013 02:06 PM, Mark Lawrence wrote: I don't remember him [jmf] ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano showing that there was a regression which I flagged up here http://bugs.python.org/is

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Ethan Furman
On 12/01/2013 02:06 PM, Mark Lawrence wrote: I don't remember him [jmf] ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano showing that there was a regression which I flagged up here http://bugs.python.org/issue16061. It was fixed by Serhiy Storch

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Mark Lawrence
On 01/12/2013 22:29, Tim Delaney wrote: On 2 December 2013 09:06, Mark Lawrence mailto:breamore...@yahoo.co.uk>> wrote: I don't remember him ever having a valid point, so FTR can we have a reference please. I do remember Steven D'Aprano showing that there was a regression which I fl

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Tim Delaney
On 2 December 2013 09:06, Mark Lawrence wrote: > I don't remember him ever having a valid point, so FTR can we have a > reference please. I do remember Steven D'Aprano showing that there was a > regression which I flagged up here http://bugs.python.org/issue16061. It > was fixed by Serhiy Storc

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Mark Lawrence
On 01/12/2013 20:54, Tim Delaney wrote: On 2 December 2013 07:15, mailto:wxjmfa...@gmail.com>> wrote: 0.11.13 02:44, Steven D'Aprano написав(ла): > (2) If you reverse that string, does it give "lëon"? The implication of > this question is that strings should operate on grapheme

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Tim Delaney
On 2 December 2013 07:15, wrote: > 0.11.13 02:44, Steven D'Aprano написав(ла): > > (2) If you reverse that string, does it give "lëon"? The implication of > > this question is that strings should operate on grapheme clusters rather > > than code points. ... > > > > BTW, a grapheme cluster *is* a

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread wxjmfauth
0.11.13 02:44, Steven D'Aprano написав(ла): > (2) If you reverse that string, does it give "lëon"? The implication of > this question is that strings should operate on grapheme clusters rather > than code points. ... > BTW, a grapheme cluster *is* a code points cluster. jmf -- https://mail.pyth

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread Serhiy Storchaka
30.11.13 02:44, Steven D'Aprano написав(ла): (2) If you reverse that string, does it give "lëon"? The implication of this question is that strings should operate on grapheme clusters rather than code points. Python fails this test: py> print("noe\u0308l"[::-1]) leon >>> print(unicodedata.norma

Re: Python Unicode handling wins again -- mostly

2013-12-01 Thread wxjmfauth
Le dimanche 1 décembre 2013 00:07:36 UTC+1, Ned Batchelder a écrit : > On 11/30/13 5:37 PM, Gregory Ewing wrote: > > > wxjmfa...@gmail.com wrote: > > >> And do you know the origin of this typographical feature? > > >> Because, mechanically, the dot of the "i" broke too often. > > >> > > >> In

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Chris Angelico
On Sun, Dec 1, 2013 at 12:27 PM, Roy Smith wrote: >> http://www.theregister.co.uk/2010/11/26/bofh_2010_episode_18/ >> >> ChrisA > > What means "PFY"? The only thing I can think of is "Poor F---ing > Yankee" :-) In the context of the BOFH, it stands for Pimply-Faced Youth and means BOFH's assista

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Roy Smith
In article , Chris Angelico wrote: > On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano > wrote: > > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > > > >> On 2013-12-01 00:22, Steven D'Aprano wrote: > >>> * KELVIN SIGN versus LATIN CAPITAL LETTER A > >> > >> I should hope so ;-) > > > > >

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Chris Angelico
On Sun, Dec 1, 2013 at 11:54 AM, Steven D'Aprano wrote: > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > >> On 2013-12-01 00:22, Steven D'Aprano wrote: >>> * KELVIN SIGN versus LATIN CAPITAL LETTER A >> >> I should hope so ;-) > > > I blame my keyboard, where letters A and K are practicall

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Tim Chase
On 2013-12-01 00:54, Steven D'Aprano wrote: > On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > > > On 2013-12-01 00:22, Steven D'Aprano wrote: > >> * KELVIN SIGN versus LATIN CAPITAL LETTER A > > > > I should hope so ;-) > > > I blame my keyboard, where letters A and K are practical

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Steven D'Aprano
On Sat, 30 Nov 2013 18:52:48 -0600, Tim Chase wrote: > On 2013-12-01 00:22, Steven D'Aprano wrote: >> * KELVIN SIGN versus LATIN CAPITAL LETTER A > > I should hope so ;-) I blame my keyboard, where letters A and K are practically right next to each other, only seven letters apart. An easy typo

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Tim Chase
On 2013-12-01 00:22, Steven D'Aprano wrote: > * KELVIN SIGN versus LATIN CAPITAL LETTER A I should hope so ;-) -tkc -- https://mail.python.org/mailman/listinfo/python-list

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Steven D'Aprano
On Sun, 01 Dec 2013 11:37:30 +1300, Gregory Ewing wrote: > Which makes it even sillier to have an 'ffi' character in this day and > age, when you can simply space the characters so that they overlap. It's in Unicode to support legacy character sets that included it[1]. There are a bunch of simil

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Ned Batchelder
On 11/30/13 5:37 PM, Gregory Ewing wrote: wxjmfa...@gmail.com wrote: And do you know the origin of this typographical feature? Because, mechanically, the dot of the "i" broke too often. In my opinion, a very plausible explanation. It doesn't sound very plausible to me, because there are a lot

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Gregory Ewing
Steven D'Aprano wrote: On Sat, 30 Nov 2013 00:37:17 -0500, Roy Smith wrote: So, who am I to argue with the people who decided that I needed to be able to type a "PILE OF POO" character. Blame the Japanese for that. Apparently some of the biggest users of Unicode are the various Japanese mobi

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Gregory Ewing
wxjmfa...@gmail.com wrote: And do you know the origin of this typographical feature? Because, mechanically, the dot of the "i" broke too often. In my opinion, a very plausible explanation. It doesn't sound very plausible to me, because there are a lot more stand-alone 'i's in English text than

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread wxjmfauth
Le samedi 30 novembre 2013 03:08:49 UTC+1, Roy Smith a écrit : > > > > The whole idea of ligatures like fi is purely typographic. The crossbar > > on the "f" (at least in some fonts) runs into the dot on the "i". > > Likewise, the top curl on an "f" run into the serif on top of the "l" >

Re: Python Unicode handling wins again -- mostly

2013-11-30 Thread Mark Lawrence
On 30/11/2013 02:08, Roy Smith wrote: In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: (8) What's the uppercase of "baffle" spelled with an ffl ligature? Like most other languages, Python 3.2 fails: py> 'baffle'.upper() 'BAfflE' but Python 3.3 passe

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Sat, 30 Nov 2013 00:37:17 -0500, Roy Smith wrote: > So, who am I to argue with the people who decided that I needed to be > able to type a "PILE OF POO" character. Blame the Japanese for that. Apparently some of the biggest users of Unicode are the various Japanese mobile phone manufacturers,

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Fri, 29 Nov 2013 23:00:27 -0700, Ian Kelly wrote: > On Fri, Nov 29, 2013 at 10:37 PM, Roy Smith wrote: >> I was speaking specifically of "ligatures like fi" (or, if you prefer, >> "ligatures like ό". By which I mean those things printers invented >> because some letter combinations look funny

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Sat, 30 Nov 2013 02:05:59 -0300, Zero Piraeus wrote: > (I happen to think the presence of ligatures in Unicode is insane, but > my dictator-of-the-world certificate appears to have gotten lost in the > post, so fixing that will have to wait). You're probably right, but we live in an insane wor

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Ian Kelly
On Fri, Nov 29, 2013 at 10:37 PM, Roy Smith wrote: > I was speaking specifically of "ligatures like fi" (or, if you prefer, > "ligatures like ό". By which I mean those things printers invented > because some letter combinations look funny when typeset as two distinct > letters. I think the encod

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529967dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > > The whole idea of ligatures like fi is purely typographic. > > In English, that's correct. I'm not sure if we can generalise that to all > languages that have ligatures. It also partly depends on how y

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Gene Heskett
On Saturday 30 November 2013 00:23:22 Zero Piraeus did opine: > On Sat, Nov 30, 2013 at 04:21:49AM +, Steven D'Aprano wrote: > > On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote: > > > The whole idea of ligatures like fi is purely typographic. > > > > In English, that's correct. I'm not su

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Zero Piraeus
: On Sat, Nov 30, 2013 at 04:21:49AM +, Steven D'Aprano wrote: > On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote: > > The whole idea of ligatures like fi is purely typographic. > > In English, that's correct. I'm not sure if we can generalise that to > all languages that have ligatures. I

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529967dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > You edited my text to remove the ligature? That's... unfortunate. It was un-ligated by the time it reached me. -- https://mail.python.org/mailman/listinfo/python-list

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Steven D'Aprano
On Fri, 29 Nov 2013 21:08:49 -0500, Roy Smith wrote: > In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>, > Steven D'Aprano wrote: > >> (8) What's the uppercase of "baffle" spelled with an ffl ligature? >> >> Like most other languages, Python 3.2 fails: >> >> py> 'baffle'.upper

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Dave Angel
On Fri, 29 Nov 2013 21:28:47 -0500, Roy Smith wrote: In article , Chris Angelico wrote: > On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote: > > I would certainly expect, x.lower() == x.upper().lower(), to be True for > > all values of x over the set of valid unicode codepoints. Having >

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article , Chris Angelico wrote: > On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote: > > I would certainly expect, x.lower() == x.upper().lower(), to be True for > > all values of x over the set of valid unicode codepoints. Having > > u"\uFB04".upper() ==> "FFL" breaks that. I would also ex

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Chris Angelico
On Sat, Nov 30, 2013 at 1:08 PM, Roy Smith wrote: > I would certainly expect, x.lower() == x.upper().lower(), to be True for > all values of x over the set of valid unicode codepoints. Having > u"\uFB04".upper() ==> "FFL" breaks that. I would also expect len(x) == > len(x.upper()) to be True. T

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Roy Smith
In article <529934dc$0$29993$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > (8) What's the uppercase of "baffle" spelled with an ffl ligature? > > Like most other languages, Python 3.2 fails: > > py> 'baffle'.upper() > 'BAfflE' > > but Python 3.3 passes: > > py> 'baffle'.upper

Re: Python Unicode handling wins again -- mostly

2013-11-29 Thread Mark Lawrence
On 30/11/2013 00:44, Steven D'Aprano wrote: (5) What is the length of "😸😾"? Both characters U+1F636 (GRINNING CAT FACE WITH SMILING EYES) and U+1F63E (POUTING CAT FACE) are outside the Basic Multilingual Plane, which means they require more than two bytes each. Most programming languages using

Re: Python unicode utf-8 characters and MySQL unicode utf-8 characters

2011-01-18 Thread Grzegorz Śliwiński
On 18 Sty, 18:15, Kushal Kumaran wrote: > 2011/1/18 Grzegorz Śliwiński : > > > Hello, > > Recently I tried to insert some unicode object in utf-8 encoding into > > MySQL using MySQLdb, and got MySQL warnings on characters like: > > 𐎲𐎠𐎥𐎠 i found somewhere in my data. I can't even read them. MySQL >

Re: Python unicode utf-8 characters and MySQL unicode utf-8 characters

2011-01-18 Thread Kushal Kumaran
2011/1/18 Grzegorz Śliwiński : > Hello, > Recently I tried to insert some unicode object in utf-8 encoding into > MySQL using MySQLdb, and got MySQL warnings on characters like: > 𐎲𐎠𐎥𐎠 i found somewhere in my data. I can't even read them. MySQL > seems to cut the whole string after that characters

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Alf P. Steinbach
* Mark Tolonen: "Terry Reedy" wrote in message news:hnjkuo$n1...@dough.gmane.org... On 3/14/2010 4:40 PM, Guillermo wrote: Adding the byte that some call a 'utf-8 bom' makes the file an invalid utf-8 file. Not true. From http://unicode.org/faq/utf_bom.html: Q: When a BOM is used, is it o

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Mark Tolonen
"Terry Reedy" wrote in message news:hnjkuo$n1...@dough.gmane.org... On 3/14/2010 4:40 PM, Guillermo wrote: Adding the byte that some call a 'utf-8 bom' makes the file an invalid utf-8 file. Not true. From http://unicode.org/faq/utf_bom.html: Q: When a BOM is used, is it only in 16-bit Uni

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Guillermo
> 2) My script gets output from a Popen call (to execute a Powershell > script [new Windows shell language] from Python; it does make sense!). > I suppose changing the Windows codepage for a single Popen call isn't > straightforward/possible? Nevermind. I'm able to change Windows' codepage to 6500

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Neil Hodgson
Guillermo: > 2) My script gets output from a Popen call (to execute a Powershell > script [new Windows shell language] from Python; it does make sense!). > I suppose changing the Windows codepage for a single Popen call isn't > straightforward/possible? You could try SetConsoleOutputCP and Set

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Guillermo
>    The console is commonly using Code Page 437 which is most compatible > with old DOS programs since it can display line drawing characters. You > can change the code page to UTF-8 with > chcp 65001 That's another issue in my actual script. A twofold problem, actually: 1) For me chcp gives 850

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Neil Hodgson
Guillermo: > Is this an enforced convention under Windows, then? My head's aching > after so much pulling at my hair, but I have the feeling that the > problem only arises when text travels through the dos console... The console is commonly using Code Page 437 which is most compatible with old

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Terry Reedy
On 3/14/2010 4:40 PM, Guillermo wrote: Hi, I would appreciate if someone could point out what am I doing wrong here. Basically, I need to save a string containing non-ascii characters to a file encoded in utf-8. If I stay in python, everything seems to work fine, but the moment I try to read t

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Joaquin Abian
On 14 mar, 22:22, Guillermo wrote: > >    That is what happens: the file now starts with a BOM \xEB\xBB\xBF as > > you can see with a hex editor. > > Is this an enforced convention under Windows, then? My head's aching > after so much pulling at my hair, but I have the feeling that the > problem o

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Guillermo
>    That is what happens: the file now starts with a BOM \xEB\xBB\xBF as > you can see with a hex editor. Is this an enforced convention under Windows, then? My head's aching after so much pulling at my hair, but I have the feeling that the problem only arises when text travels through the dos co

Re: Python unicode and Windows cmd.exe

2010-03-14 Thread Neil Hodgson
Guillermo: > I then open the file m.txt with notepad, and I see "mañana" normally. > I save (again, no actual modifications), go back to the dos prompt, do > type m.txt and this time it works! I get "mañana". When notepad opens > the file, the encoding is already UTF-8, so short of a UTF-8 bom bei

Re: Python Unicode to String conversion

2007-09-17 Thread Gabriel Genellina
En Mon, 17 Sep 2007 01:33:14 -0300, Richard Levasseur <[EMAIL PROTECTED]> escribi�: > When dealing with unicode, i've run into situations where I have > multiple encodings in the same string, usually latin1 and utf8 > (latin1 != ascii, and latin1 != utf8, and they don't play nice > together). So

Re: Python Unicode to String conversion

2007-09-16 Thread Richard Levasseur
> On 1 sep, 09:17, iapain <[EMAIL PROTECTED]> wrote: > > > First make sure your DB encoding is UTF-8 not the latin1 > It took me days to figure out what was going on when dealing with unicode, ascii, latin1, utf8, decodeerrors, etc, so I'm just chiming in to echo something similar iapain's comment

Re: Python Unicode to String conversion

2007-09-16 Thread thijs . braem
Sorry for answering so late. Thanks a million! This code snippet helped me solve the problem. I think I will be using SQLAlchemy for these sorts of things from now on though, it seems to be taking care of these things itself, on top of being one hell of a handy ORM of course :) thijs On 1 sep, 0

Re: Python Unicode to String conversion

2007-09-01 Thread iapain
First make sure your DB encoding is UTF-8 not the latin1 > The error I keep having is something like this: > ERREUR: Séquence d'octets invalide pour le codage «UTF8» : 0xe02063 then try this: def smart_str(s, encoding='utf-8', errors='strict'): """ Returns a bytestring version of 's', e

Re: Python Unicode to String conversion

2007-08-31 Thread Lawrence D'Oliveiro
In message <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote: > The error I keep having is something like this: > ERREUR: Séquence d'octets invalide pour le codage «UTF8» : 0xe02063 It would be useful to see some actual code snippet, traceback listing etc. -- http://mail.python.org/mailman/listinfo

Re: Python Unicode to String conversion

2007-08-31 Thread Carsten Haese
On Fri, 2007-08-31 at 15:55 -0700, [EMAIL PROTECTED] wrote: > Hi everyone, > > I'm having quite some troubles trying to convert Unicode to String > (for use in psycopg, which apparently doesn't know how to cope with > unicode strings). > > The error I keep having is something like this: > ERREUR:

Re: Python Unicode to String conversion

2007-08-31 Thread John Machin
On Sep 1, 9:56 am, "Chris Mellon" <[EMAIL PROTECTED]> wrote: > On 8/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > Hi everyone, > > > I'm having quite some troubles trying to convert Unicode to String > > (for use in psycopg, which apparently doesn't know how to cope with > > unicode str

Re: Python Unicode to String conversion

2007-08-31 Thread John Machin
On Sep 1, 8:55 am, [EMAIL PROTECTED] wrote: > Hi everyone, > > I'm having quite some troubles trying to convert Unicode to String > (for use in psycopg, which apparently doesn't know how to cope with > unicode strings). > > The error I keep having is something like this: > ERREUR: Séquence d'octet

Re: Python Unicode to String conversion

2007-08-31 Thread Chris Mellon
On 8/31/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Hi everyone, > > I'm having quite some troubles trying to convert Unicode to String > (for use in psycopg, which apparently doesn't know how to cope with > unicode strings). > > The error I keep having is something like this: > ERREUR: Séq

Re: Python Unicode to String conversion

2007-08-31 Thread Larry Bates
[EMAIL PROTECTED] wrote: > Hi everyone, > > I'm having quite some troubles trying to convert Unicode to String > (for use in psycopg, which apparently doesn't know how to cope with > unicode strings). > > The error I keep having is something like this: > ERREUR: Séquence d'octets invalide pour l

Re: Python & Unicode decimal interpretation

2005-12-03 Thread Martin v. Löwis
Scott David Daniels wrote: >> >>> int(u"\N{DEVANAGARI DIGIT SEVEN}") >> 7 > > OK, That much I have handled. I am fiddling with direct-to-number > conversions and wondering about cases like >>>> int(u"\N{DEVANAGARI DIGIT SEVEN}" + XXX >+ u"\N{DEVANAGARI DIGIT SEVEN}") int() passe

Re: Python & Unicode decimal interpretation

2005-12-03 Thread Scott David Daniels
Martin v. Löwis wrote: > Scott David Daniels wrote: >> In reading over the source for CPython's PyUnicode_EncodeDecimal, >> I see a dance to handle characters which are neither dec-equiv nor >> in Latin-1. Does anyone know about the intent of such a conversion? > > To support this: > > >>> int(

Re: Python & Unicode decimal interpretation

2005-12-03 Thread Martin v. Löwis
Scott David Daniels wrote: > In reading over the source for CPython's PyUnicode_EncodeDecimal, > I see a dance to handle characters which are neither dec-equiv nor > in Latin-1. Does anyone know about the intent of such a conversion? To support this: >>> int(u"\N{DEVANAGARI DIGIT SEVEN}") 7 >

Re: Python & unicode

2005-01-12 Thread Serge Orlov
Michel Claveau - abstraction méta-galactique non triviale en fuite perpétuelle. wrote: > Hi ! > > Sorry, but I think that, for russians, english is an *add-on*, > and not a common-denominator. You miss the point, programs are not English writings, they are written in computer languages using libra

Re: Python & unicode

2005-01-12 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, Michel Claveau - abstraction méta-galactique non triviale en fuite perpétuelle. wrote: > I understand, but I have a feeling of attempt at hegemony. Is english > language really least-common-denominator for a russian who writes into > cyrillic, or not anglophone chinese? >

Re: Python & unicode

2005-01-12 Thread Scott David Daniels
[EMAIL PROTECTED] wrote: Scott David Daniels wrote: If you allow non-ASCII characters in symbol names, your source code will be unviewable (and uneditable) for people with ASCII-only terminals, never mind how comprehensible it might otherwise be. So how does one edit non ascii string literals at th

  1   2   >