Re: Unicode filenames

2019-12-07 Thread Chris Angelico
On Sun, Dec 8, 2019 at 8:33 AM Bob van der Poel wrote: > Yeah, heard all that before :) But, seriously, I wonder how many short > (less than 100 lines) programs there are out there written in py2 that will > not run in py3. Good thing py2 will still be available to be installed for > many, many ye

Re: Unicode filenames

2019-12-07 Thread Bob van der Poel
On Sat, Dec 7, 2019 at 12:47 PM DL Neil via Python-list < python-list@python.org> wrote: > On 8/12/19 5:50 AM, Bob van der Poel wrote: > > On Sat, Dec 7, 2019 at 4:00 AM Barry Scott > wrote: > >>> On 6 Dec 2019, at 18:17, Bob van der Poel wrote: > >>> > >>> I have some files which came off the n

Re: Unicode filenames

2019-12-07 Thread DL Neil via Python-list
On 8/12/19 5:50 AM, Bob van der Poel wrote: On Sat, Dec 7, 2019 at 4:00 AM Barry Scott wrote: On 6 Dec 2019, at 18:17, Bob van der Poel wrote: I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the filename and

Re: Unicode filenames

2019-12-07 Thread Bob van der Poel
On Sat, Dec 7, 2019 at 4:00 AM Barry Scott wrote: > > > > On 6 Dec 2019, at 18:17, Bob van der Poel wrote: > > > > I have some files which came off the net with, I'm assuming, unicode > > characters in the names. I have a very short program which takes the > > filename and puts into an emacs buf

Re: Unicode filenames

2019-12-07 Thread Barry Scott
> On 6 Dec 2019, at 18:17, Bob van der Poel wrote: > > I have some files which came off the net with, I'm assuming, unicode > characters in the names. I have a very short program which takes the > filename and puts into an emacs buffer, and then lets me add information to > that new file (it's

Re: Unicode filenames

2019-12-07 Thread Peter Otten
Bob van der Poel wrote: > I have some files which came off the net with, I'm assuming, unicode > characters in the names. I have a very short program which takes the > filename and puts into an emacs buffer, and then lets me add information > to that new file (it's a poor man's DB). > > Next, I c

Re: Unicode filenames

2019-12-06 Thread Terry Reedy
On 12/6/2019 1:17 PM, Bob van der Poel wrote: I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the filename and puts into an emacs buffer, and then lets me add information to that new file (it's a poor man's DB).

Re: Unicode filenames

2019-12-06 Thread DL Neil via Python-list
On 7/12/19 7:17 AM, Bob van der Poel wrote: I have some files which came off the net with, I'm assuming, unicode characters in the names. I have a very short program which takes the filename and puts into an emacs buffer, and then lets me add information to that new file (it's a poor man's DB).

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-19 Thread MRAB
On 2019-09-19 09:55, Gregory Ewing wrote: Eli the Bearded wrote: There isn't anything called UCS1. Apparently there is, but it's not a character set, it's a loudspeaker. https://www.bhphotovideo.com/c/product/1205978-REG/yorkville_sound_ucs1_1200w_15_horn_loaded.html The OP might mean Py_UCS

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-19 Thread Gregory Ewing
Eli the Bearded wrote: There isn't anything called UCS1. Apparently there is, but it's not a character set, it's a loudspeaker. https://www.bhphotovideo.com/c/product/1205978-REG/yorkville_sound_ucs1_1200w_15_horn_loaded.html -- Greg -- https://mail.python.org/mailman/listinfo/python-list

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-17 Thread Chris Angelico
On Wed, Sep 18, 2019 at 6:51 AM Eli the Bearded <*@eli.users.panix.com> wrote: > > In comp.lang.python, moi wrote: > > I hope, one day, for those who are interested in Unicode, > > they find a book, publication, ... which will explain > > what is UCS1. > > There isn't anything called UCS1. There

Re: Unicode UCS2, UCS4 and ... UCS1

2019-09-17 Thread Eli the Bearded
In comp.lang.python, moi wrote: > I hope, one day, for those who are interested in Unicode, > they find a book, publication, ... which will explain > what is UCS1. There isn't anything called UCS1. There is a UTF-1, but don't use it. UTF-8 is better in every way. https://en.wikipedia.org/wiki/U

Re: unicode mail list archeology

2019-04-20 Thread Luuk
On 20-4-2019 12:47, Luuk wrote: On 20-4-2019 11:26, wxjmfa...@gmail.com wrote: http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML018/0594.html [quoot] > It is simple to make a compacter version of UTF-8 using the base > 256 character codes were possible (comacter for many languages).

Re: unicode mail list archeology

2019-04-20 Thread Luuk
On 20-4-2019 11:26, wxjmfa...@gmail.com wrote: http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML018/0594.html [quoot] > It is simple to make a compacter version of UTF-8 using the base > 256 character codes were possible (comacter for many languages). No. If you think otherwise, you ha

Re: Unicode [was Re: Cult-like behaviour]

2018-07-17 Thread Tim Chase
On 2018-07-17 08:37, Marko Rauhamaa wrote: > Tim Chase : > > Wait, but now you're talking about vendors. Much of the crux of > > this discussion has been about personal scripts that don't need to > > marshal Unicode strings in and out of various functions/objects. > > In both personal and profes

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Marko Rauhamaa
Tim Chase : > On 2018-07-16 23:59, Marko Rauhamaa wrote: >> Tim Chase : >> > While the python world has moved its efforts into improving >> > Python3, Python2 hasn't suddenly stopped working. >> >> The sword of Damocles is hanging on its head. Unless a consortium is >> erected to support Python

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Tim Chase
On 2018-07-16 23:59, Marko Rauhamaa wrote: > Tim Chase : > > While the python world has moved its efforts into improving > > Python3, Python2 hasn't suddenly stopped working. > > The sword of Damocles is hanging on its head. Unless a consortium is > erected to support Python2, no vendor will be

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
On 16/07/18 21:16, Rhodri James wrote: On 16/07/18 20:58, Terry Reedy wrote: On 7/16/2018 1:27 PM, Jim Lee wrote: 90% of the world *is* "beneath my notice" when it comes to programming for myself.   I really don't care if that's not PC enough for you. Had you actually read my words with *in

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread MRAB
On 2018-07-16 21:59, Marko Rauhamaa wrote: Tim Chase : While the python world has moved its efforts into improving Python3, Python2 hasn't suddenly stopped working. The sword of Damocles is hanging on its head. Unless a consortium is erected to support Python2, no vendor will be able to use it

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 6:32 AM, Tim Chase wrote: > On 2018-07-16 18:31, Steven D'Aprano wrote: >> You say that all you want is a switch to turn off Unicode (and >> replace it with what? Kanji strings? Cyrillic? Shift_JS? no of >> course not, I'm being absurd -- replace it with ASCII, what else >>

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Marko Rauhamaa
Tim Chase : > While the python world has moved its efforts into improving Python3, > Python2 hasn't suddenly stopped working. The sword of Damocles is hanging on its head. Unless a consortium is erected to support Python2, no vendor will be able to use it in the medium term. Given the recent even

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Tim Chase
On 2018-07-16 18:31, Steven D'Aprano wrote: > You say that all you want is a switch to turn off Unicode (and > replace it with what? Kanji strings? Cyrillic? Shift_JS? no of > course not, I'm being absurd -- replace it with ASCII, what else > could any right-thinking person want, right?). But we a

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 6:16 AM, Rhodri James wrote: > On 16/07/18 20:58, Terry Reedy wrote: >> >> On 7/16/2018 1:27 PM, Jim Lee wrote: >> >>> 90% of the world *is* "beneath my notice" when it comes to programming >>> for myself. I really don't care if that's not PC enough for you. >>> >>> Had y

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James
On 16/07/18 20:58, Terry Reedy wrote: On 7/16/2018 1:27 PM, Jim Lee wrote: 90% of the world *is* "beneath my notice" when it comes to programming for myself.   I really don't care if that's not PC enough for you. Had you actually read my words with *intent* rather than *reaction*, you would

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Anders Wegge Keller
På Mon, 16 Jul 2018 11:33:46 -0700 Jim Lee skrev: > Go right ahead.  I find it surprising that Stephen isn't banned, > considering the fact that he ridicules anyone he doesn't agree with.  > But I guess he's one of the 'good 'ol boys', and so exempt from the code > of conduct. Well said! --

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Terry Reedy
On 7/16/2018 1:27 PM, Jim Lee wrote: 90% of the world *is* "beneath my notice" when it comes to programming for myself.   I really don't care if that's not PC enough for you. Had you actually read my words with *intent* rather than *reaction*, you would notice that I suggested the *option* of

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Terry Reedy
On 7/16/2018 1:13 PM, Jim Lee wrote: I just think that a language should allow one to bypass Unicode handling easily *when it's not needed*. Both for patching IDLE and for my currently private work, I usually only use Ascii, and no unicode escapes. When I do, it does not matter whether edit

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James
On 16/07/18 18:38, Rhodri James wrote: Actually having an option of turning off Unicode *does* make it harder to use, because you end up coming across programs that have Unicode and surprise you when they misbehave.  And yes I saw that 90% of your programs aren't intended to get out into the wo

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee
On 07/16/18 11:31, Steven D'Aprano wrote: On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote: Had you actually read my words with *intent* rather than *reaction*, you would notice that I suggested the *option* of turning off Unicode. Yes, I know what you wrote, and I read it with intent. Ji

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee
On 07/16/18 10:40, Mark Lawrence wrote: On 16/07/18 18:27, Jim Lee wrote: Obviously, the most vocal representatives of the Python community are too sensitive about their language to enable rational discussion. Please moderators ban this person as he's going down the same line as bartc and s

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James
On 16/07/18 19:31, Steven D'Aprano wrote: I'm simply not seeing the advantage of: from __future__ import no_unicode print("Hello World!") # stand in for any string handling on ASCII Sure this should be "from __past__ import no_unicode"? gd&r -- Rhodri James *-* Kynesim Ltd -- http

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 10:27:18 -0700, Jim Lee wrote: > Had you actually read my words with *intent* rather than *reaction*, you > would notice that I suggested the *option* of turning off Unicode. Yes, I know what you wrote, and I read it with intent. Jim, you seem to be labouring under the misapp

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James
On 16/07/18 18:27, Jim Lee wrote: 90% of the world *is* "beneath my notice" when it comes to programming for myself.   I really don't care if that's not PC enough for you. Had you actually read my words with *intent* rather than *reaction*, you would notice that I suggested the *option* of tur

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
On 16/07/18 18:13, Jim Lee wrote: I just think that a language should allow one to bypass Unicode handling easily *when it's not needed*. I have no idea what this is meant to mean. I've written loads of code for my own purposes and I've never had to think about Unicode, so why should an

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
On 16/07/18 18:27, Jim Lee wrote: Obviously, the most vocal representatives of the Python community are too sensitive about their language to enable rational discussion. Please moderators ban this person as he's going down the same line as bartc and similar, it is completely unacceptable, he's

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee
On 07/16/18 03:39, Steven D'Aprano wrote: Good for you. But Python is not a programming language written to satisfy the needs of people like you, and ONLY people like you. It is a language written to satisfy the needs of people from Uzbekistan, and China, and Japan, and India, and Brazil, and

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Jim Lee
On 07/16/18 03:26, Steven D'Aprano wrote: But the thing is, that complexity is *inherent in the domain*. You can try to deal with it without Unicode, and as soon as you have users expecting to use more than one code page, you're doomed. No, I'm not doomed, because there *are* no other users

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Tue, 17 Jul 2018 02:22:59 +1000, Chris Angelico wrote: > On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence > wrote: >> Out of curiosity where does my mum's Welsh come into the equation as I >> believe that it is not recognised by the EU as a language? >> >> > What characters does it use? Mostly L

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
On 16/07/18 17:22, Chris Angelico wrote: On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence wrote: On 16/07/18 15:17, Dan Sommers wrote: On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: ... people who think that if ISO-8859-7 was good enough for Jesus ... It may have been good enou

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
On 16/07/18 17:26, Larry Martell wrote: On Mon, Jul 16, 2018 at 12:05 PM, Mark Lawrence wrote: On 16/07/18 15:17, Dan Sommers wrote: On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: ... people who think that if ISO-8859-7 was good enough for Jesus ... It may have been good enou

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James
On 16/07/18 17:05, Mark Lawrence wrote: On 16/07/18 15:17, Dan Sommers wrote: On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: ... people who think that if ISO-8859-7 was good enough for Jesus ... It may have been good enough for his disciples, but Jesus spoke Aramaic. Also, ISO-8

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Rhodri James
On 16/07/18 17:22, Chris Angelico wrote: What characters does it use? Mostly Latin letters? Basic Latin plus U+0174 (LATIN CAPITAL LETTER W WITH CIRCUMFLEX) through to U+0177 (LATIN SMALL LETTER Y WITH CIRCUMFLEX) I think. -- Rhodri James *-* Kynesim Ltd -- https://mail.python.org/mailman/li

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Larry Martell
On Mon, Jul 16, 2018 at 12:05 PM, Mark Lawrence wrote: > On 16/07/18 15:17, Dan Sommers wrote: >> >> On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: >> >>> ... people who think that if ISO-8859-7 was good enough for Jesus ... >> >> >> It may have been good enough for his disciples, but

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Chris Angelico
On Tue, Jul 17, 2018 at 2:05 AM, Mark Lawrence wrote: > On 16/07/18 15:17, Dan Sommers wrote: >> >> On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: >> >>> ... people who think that if ISO-8859-7 was good enough for Jesus ... >> >> >> It may have been good enough for his disciples, but J

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Mark Lawrence
On 16/07/18 15:17, Dan Sommers wrote: On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: ... people who think that if ISO-8859-7 was good enough for Jesus ... It may have been good enough for his disciples, but Jesus spoke Aramaic. Also, ISO-8859-7 doesn't cover ancient polytonic Gre

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Anders Wegge Keller
> The buzzing noise you just heard was the joke whizzing past your head > *wink* I have twins aged four. They also like to yell "I cheated!", whenever they are called out. In general, you need to get rid of tat teenage brat persona you practice. The "ranting rick" charade was especially toe-

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Steven D'Aprano
On Mon, 16 Jul 2018 14:17:35 +, Dan Sommers wrote: > On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: > >> ... people who think that if ISO-8859-7 was good enough for Jesus ... > > It may have been good enough for his disciples, but Jesus spoke Aramaic. The buzzing noise you just

Re: Unicode [was Re: Cult-like behaviour]

2018-07-16 Thread Dan Sommers
On Mon, 16 Jul 2018 10:39:49 +, Steven D'Aprano wrote: > ... people who think that if ISO-8859-7 was good enough for Jesus ... It may have been good enough for his disciples, but Jesus spoke Aramaic. Also, ISO-8859-7 doesn't cover ancient polytonic Greek; it only covers modern monotonic Gree

Re: unicode direction control characters

2018-01-02 Thread Random832
On Tue, Jan 2, 2018, at 10:36, Robin Becker wrote: > >> u'\u200e28\u200e/\u200e09\u200e/\u200e1962' > > I guess I'm really wondering whether the BIDI control characters have any > semantic meaning. Most numbers seem to be LTR. > > If I saw u'\u200f12' it seems to imply that the characters should

Re: unicode direction control characters

2018-01-02 Thread Chris Angelico
On Wed, Jan 3, 2018 at 2:36 AM, Robin Becker wrote: > On 02/01/2018 15:18, Chris Angelico wrote: >> >> On Wed, Jan 3, 2018 at 1:30 AM, Robin Becker wrote: >>> >>> I'm seeing some strange characters in web responses eg >>> >>> u'\u200e28\u200e/\u200e09\u200e/\u200e1962' >>> >>> for a date of birth

Re: unicode direction control characters

2018-01-02 Thread Robin Becker
On 02/01/2018 15:18, Chris Angelico wrote: On Wed, Jan 3, 2018 at 1:30 AM, Robin Becker wrote: I'm seeing some strange characters in web responses eg u'\u200e28\u200e/\u200e09\u200e/\u200e1962' for a date of birth. The code \u200e is LEFT-TO-RIGHT MARK according to unicodedata.name. I tried

Re: unicode direction control characters

2018-01-02 Thread Chris Angelico
On Wed, Jan 3, 2018 at 1:30 AM, Robin Becker wrote: > I'm seeing some strange characters in web responses eg > > u'\u200e28\u200e/\u200e09\u200e/\u200e1962' > > for a date of birth. The code \u200e is LEFT-TO-RIGHT MARK according to > unicodedata.name. I tried unicodedata.normalize, but it leaves

Re: Unicode

2017-09-17 Thread leam hall
Matt wrote: Hi Leam- > > Targeting Python 2.6 for deployment on RHEL/CentOS 6 is a perfectly > valid use case, and after the recent discussions in multiple threads > (your "Design: method in class or general function?" and INADA Naoki's > "People choosing Python 3"), I doubt it would be very usefu

Re: Unicode

2017-09-17 Thread Matt Ruffalo
On 2017-09-17 17:27, leam hall wrote: > > Ah! So this works in Py2: >def __str__(self): > name= self.name.encode("utf-8") > > > It completely fails in Py3: > PVT b'Lakeisha F\xc3\xa1bi\xc3\xa1n' 7966A4 [F] Age: 22 > > > Note that moving __str__() to display() gets the same result

Re: Unicode

2017-09-17 Thread leam hall
On Sun, Sep 17, 2017 at 3:27 PM, Peter Otten <__pete...@web.de> wrote: > leam hall wrote: > > > Doesn't seem to work. The failing code takes the strings as is from the > > database. it will occasionally fail when a name comes up that uses > > a non-ascii character. > > Your problem in nuce: the Py

Re: Unicode

2017-09-17 Thread Peter Otten
leam hall wrote: > Doesn't seem to work. The failing code takes the strings as is from the > database. it will occasionally fail when a name comes up that uses > a non-ascii character. Your problem in nuce: the Python 2 __str__() method must not return unicode. >>> class Character: ... def _

Re: Unicode

2017-09-17 Thread Chris Angelico
On Mon, Sep 18, 2017 at 2:20 AM, leam hall wrote: > On Sun, Sep 17, 2017 at 9:13 AM, Peter Otten <__pete...@web.de> wrote: > >> Leam Hall wrote: >> >> > On 09/17/2017 08:30 AM, Chris Angelico wrote: >> >> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall wrote: >> >>> Still trying to keep this Py2 and P

Re: Unicode

2017-09-17 Thread leam hall
On Sun, Sep 17, 2017 at 9:13 AM, Peter Otten <__pete...@web.de> wrote: > Leam Hall wrote: > > > On 09/17/2017 08:30 AM, Chris Angelico wrote: > >> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall wrote: > >>> Still trying to keep this Py2 and Py3 compatible. > >>> > >>> The Py2 error is: > >>>

Re: Unicode

2017-09-17 Thread Peter Otten
Leam Hall wrote: > On 09/17/2017 08:30 AM, Chris Angelico wrote: >> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall wrote: >>> Still trying to keep this Py2 and Py3 compatible. >>> >>> The Py2 error is: >>> UnicodeEncodeError: 'ascii' codec can't encode character >>> u'\xf6' in posit

Re: Unicode

2017-09-17 Thread Leam Hall
On 09/17/2017 08:30 AM, Chris Angelico wrote: On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall wrote: Still trying to keep this Py2 and Py3 compatible. The Py2 error is: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 8: ordinal not in range(128) even

Re: Unicode (was: Old Man Yells At Cloud)

2017-09-17 Thread Chris Angelico
On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall wrote: > Still trying to keep this Py2 and Py3 compatible. > > The Py2 error is: > UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' > in position 8: ordinal not in range(128) > > even when the string is manually converted:

Re: Unicode (was: Old Man Yells At Cloud)

2017-09-17 Thread Paul Moore
On 17 September 2017 at 12:38, Leam Hall wrote: > On 09/17/2017 07:25 AM, Steve D'Aprano wrote: >> >> On Sun, 17 Sep 2017 08:03 pm, Leam Hall wrote: >> >>> I'm still trying to figure out how to convert a string to unicode in >>> Python 2. >> >> >> >> A Python 2 string is a string of bytes, so you

Re: Unicode support in Python 2.7.8 - 16 bit

2017-03-07 Thread Steven D'Aprano
On Tue, 07 Mar 2017 14:05:15 -0800, John Nagle wrote: > How do I test if a Python 2.7.8 build was built for 32-bit Unicode? sys.maxunicode will be 1114111 if it is a "wide" (32-bit) build and 65535 if it is a "narrow" (16-bit) build. You can double-check with: unichr(0x10) # will raise V

Re: Unicode support in Python 2.7.8 - 16 bit

2017-03-07 Thread Terry Reedy
On 3/7/2017 5:05 PM, John Nagle wrote: How do I test if a Python 2.7.8 build was built for 32-bit Unicode? (I'm dealing with shared hosting, and I'm stuck with their provided versions.) If I give this to Python 2.7.x: sy = u'\U0001f60f' len(sy) is 1 on a Ubuntu 14.04LTS machine, but 2

Re: Unicode support in Python 2.7.8 - 16 bit

2017-03-07 Thread Chris Angelico
On Wed, Mar 8, 2017 at 9:05 AM, John Nagle wrote: >How do I test if a Python 2.7.8 build was built for 32-bit > Unicode? (I'm dealing with shared hosting, and I'm stuck > with their provided versions.) > > If I give this to Python 2.7.x: > > sy = u'\U0001f60f' > > len(sy) is 1 on a Ubuntu

Re: Unicode script

2016-12-17 Thread MRAB
On 2016-12-16 02:44, MRAB wrote: On 2016-12-15 21:57, Terry Reedy wrote: On 12/15/2016 1:06 PM, MRAB wrote: On 2016-12-15 16:53, Steve D'Aprano wrote: Suppose I have a Unicode character, and I want to determine the script or scripts it belongs to. For example: U+0033 DIGIT THREE "3" belongs

Re: Unicode script

2016-12-15 Thread MRAB
On 2016-12-15 21:57, Terry Reedy wrote: On 12/15/2016 1:06 PM, MRAB wrote: On 2016-12-15 16:53, Steve D'Aprano wrote: Suppose I have a Unicode character, and I want to determine the script or scripts it belongs to. For example: U+0033 DIGIT THREE "3" belongs to the script "COMMON"; U+0061 LAT

Re: Unicode script

2016-12-15 Thread Terry Reedy
On 12/15/2016 1:06 PM, MRAB wrote: On 2016-12-15 16:53, Steve D'Aprano wrote: Suppose I have a Unicode character, and I want to determine the script or scripts it belongs to. For example: U+0033 DIGIT THREE "3" belongs to the script "COMMON"; U+0061 LATIN SMALL LETTER A "a" belongs to the scri

Re: Unicode script

2016-12-15 Thread Terry Reedy
On 12/15/2016 11:53 AM, Steve D'Aprano wrote: Suppose I have a Unicode character, and I want to determine the script or scripts it belongs to. For example: U+0033 DIGIT THREE "3" belongs to the script "COMMON"; U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN"; U+03BE GREEK SMALL LE

Re: Unicode script

2016-12-15 Thread MRAB
On 2016-12-15 16:53, Steve D'Aprano wrote: Suppose I have a Unicode character, and I want to determine the script or scripts it belongs to. For example: U+0033 DIGIT THREE "3" belongs to the script "COMMON"; U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN"; U+03BE GREEK SMALL LETTE

Re: Unicode script

2016-12-15 Thread Joel Goldstick
I think this might be what you want: https://docs.python.org/3/howto/unicode.html#unicode-properties On Thu, Dec 15, 2016 at 11:53 AM, Steve D'Aprano wrote: > Suppose I have a Unicode character, and I want to determine the script or > scripts it belongs to. > > For example: > > U+0033 DIGIT THREE

Re: Unicode script

2016-12-15 Thread eryk sun
On Thu, Dec 15, 2016 at 4:53 PM, Steve D'Aprano wrote: > Suppose I have a Unicode character, and I want to determine the script or > scripts it belongs to. > > For example: > > U+0033 DIGIT THREE "3" belongs to the script "COMMON"; > U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN"; >

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-10 Thread Gregory Ewing
Steven D'Aprano : But when you get down to fundamentals, character sets and alphabets have always blurred the line between presentation and meaning. W ("double-u") was, once upon a time, UU And before that, it was VV, because the Romans used V the way we now use U, and didn't have a letter U.

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-10 Thread Gregory Ewing
Ben Bacarisse wrote: The problem with that theory is that 'er/re' (this is e and r in either order) is the 3rd most common pair in English but have been placed together. No, they haven't. The order of the characters in the type basket goes down the slanted columns of keys, so E and R are separa

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread Stephen Hansen
On Sat, Apr 9, 2016, at 12:25 PM, Mark Lawrence via Python-list wrote: > Again, where is the relevance to Python in this discussion, as we're on > the main Python mailing list? Please can the moderators take this stuff > out, it is getting beyond the pale. You need to come to grip with the fact

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread Mark Lawrence via Python-list
On 09/04/2016 17:08, Rustom Mody wrote: On Saturday, April 9, 2016 at 7:14:05 PM UTC+5:30, Ben Bacarisse wrote: The problem with that theory is that 'er/re' (this is e and r in either order) is the 3rd most common pair in English but have been placed together. ou and et (in either order) are th

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread Ben Bacarisse
Rustom Mody writes: > On Saturday, April 9, 2016 at 7:14:05 PM UTC+5:30, Ben Bacarisse wrote: >> The problem with that theory is that 'er/re' (this is e and r in either >> order) is the 3rd most common pair in English but have been placed >> together. ou and et (in either order) are the 15th and

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread Rustom Mody
On Saturday, April 9, 2016 at 7:14:05 PM UTC+5:30, Ben Bacarisse wrote: > The problem with that theory is that 'er/re' (this is e and r in either > order) is the 3rd most common pair in English but have been placed > together. ou and et (in either order) are the 15th and 22nd most common > and the

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread Ben Bacarisse
Ben Bacarisse writes: > alister writes: > >> >> the design of qwerty was not to "Slow" the typist bu to ensure that the >> hammers for letters commonly used together are spaced widely apart, >> reducing the portion of trier travel arc were the could jam. >> I and E are actually such a pair w

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread Ben Bacarisse
alister writes: > > the design of qwerty was not to "Slow" the typist bu to ensure that the > hammers for letters commonly used together are spaced widely apart, > reducing the portion of trier travel arc were the could jam. > I and E are actually such a pair which is why they are at opposite

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-09 Thread alister
On Fri, 08 Apr 2016 20:20:02 -0400, Dennis Lee Bieber wrote: > On Fri, 8 Apr 2016 11:04:53 -0700 (PDT), Rustom Mody > declaimed the following: > >>Its reasonably likely that all our keyboards start QWERT... >> Doesn't make it a sane design. >> > It was a sane design -- for early mechanical

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Marko Rauhamaa
Steven D'Aprano : > But when you get down to fundamentals, character sets and alphabets have > always blurred the line between presentation and meaning. W ("double-u") > was, once upon a time, UU But as every Finnish-speaker now knows, "w" is only an old-fashioned typographic variant of the glyph

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Steven D'Aprano
On Sat, 9 Apr 2016 03:21 am, Peter Pearson wrote: > On Fri, 08 Apr 2016 16:00:10 +1000, Steven D'Aprano > wrote: >> On Fri, 8 Apr 2016 02:51 am, Peter Pearson wrote: >>> >>> The Unicode consortium was certifiably insane when it went into the >>> typesetting business. >> >> They are not, and neve

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Rustom Mody
Adding link On Friday, April 8, 2016 at 11:48:07 PM UTC+5:30, Rustom Mody wrote: > 5.12 Deprecation > > In the Unicode Standard, the term deprecation is used somewhat differently > than it is in some other standards. Deprecation is used to mean that a > character or other feature is strongly d

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Rustom Mody
On Friday, April 8, 2016 at 11:33:38 PM UTC+5:30, Peter Pearson wrote: > On Sat, 9 Apr 2016 03:50:16 +1000, Chris Angelico wrote: > > On Sat, Apr 9, 2016 at 3:44 AM, Marko Rauhamaa wrote: > [snip] > >> (As for ligatures, I understand that there might be quite a bit of > >> legacy software that ded

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Rustom Mody
On Friday, April 8, 2016 at 11:14:21 PM UTC+5:30, Marko Rauhamaa wrote: > Peter Pearson : > > > On Fri, 08 Apr 2016 16:00:10 +1000, Steven D'Aprano wrote: > >> They are not, and never have been, in the typesetting business. > >> Perhaps characters are not the only things easily confused *wink* >

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Peter Pearson
On Sat, 9 Apr 2016 03:50:16 +1000, Chris Angelico wrote: > On Sat, Apr 9, 2016 at 3:44 AM, Marko Rauhamaa wrote: [snip] >> (As for ligatures, I understand that there might be quite a bit of >> legacy software that dedicated code points and code pages for ligatures. >> Translating that legacy soft

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Rustom Mody
On Friday, April 8, 2016 at 10:24:17 AM UTC+5:30, Chris Angelico wrote: > On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody wrote: > > No I am not clever/criminal enough to know how to write a text that is > > visually > > close to > > print "Hello World" > > but is internally closer to > > rm -rf / >

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Chris Angelico
On Sat, Apr 9, 2016 at 3:44 AM, Marko Rauhamaa wrote: > Unicode heroically and definitively solved the problems ASCII had posed > but introduced a bag of new, trickier problems. > > (As for ligatures, I understand that there might be quite a bit of > legacy software that dedicated code points and

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Marko Rauhamaa
Peter Pearson : > On Fri, 08 Apr 2016 16:00:10 +1000, Steven D'Aprano > wrote: >> They are not, and never have been, in the typesetting business. >> Perhaps characters are not the only things easily confused *wink* > > Defining codepoints that deal with appearance but not with meaning is > going

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-08 Thread Peter Pearson
On Fri, 08 Apr 2016 16:00:10 +1000, Steven D'Aprano wrote: > On Fri, 8 Apr 2016 02:51 am, Peter Pearson wrote: >> >> The Unicode consortium was certifiably insane when it went into the >> typesetting business. > > They are not, and never have been, in the typesetting business. Perhaps > character

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 4:00 PM, Steven D'Aprano wrote: > Or for that matter: > > a = akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqwe9fhlcjbqvcbhsiauy37wkg() + 100 > b = 100 + akjhvciwfdwkejfc2qweoduycwldvqspjcwuhoqew9fhlcjbqvcbhsiauy37wkg() > > How easily can you tell them apart at a glance? Ouch! Can'

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Steven D'Aprano
On Fri, 8 Apr 2016 02:51 am, Peter Pearson wrote: > Seriously, it's cute how neatly normalisation works when you're > watching closely and using it in the circumstances for which it was > intended, but that hardly proves that these practices won't cause much > trouble when they're used more casual

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 2:43 PM, Rustom Mody wrote: > No I am not clever/criminal enough to know how to write a text that is > visually > close to > print "Hello World" > but is internally closer to > rm -rf / > > For me this: > >>> Α = 1 A = 2 Α + 1 == A > True > > > is cure enoug

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Rustom Mody
On Friday, April 8, 2016 at 10:13:16 AM UTC+5:30, Rustom Mody wrote: > No I am not clever/criminal enough to know how to write a text that is > visually > close to > print "Hello World" > but is internally closer to > rm -rf / > > For me this: > >>> Α = 1 > >>> A = 2 > >>> Α + 1 == A > True >

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Rustom Mody
On Thursday, April 7, 2016 at 10:22:18 PM UTC+5:30, Peter Pearson wrote: > On Thu, 07 Apr 2016 11:37:50 +1000, Steven D'Aprano wrote: > > On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote: > >> Rustom Mody wrote: > > > >>> So here are some examples to illustrate what I am saying: > >>>

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Chris Angelico
On Fri, Apr 8, 2016 at 2:51 AM, Peter Pearson wrote: > The pile-of-poo character was just frosting on > the cake. > > (Sorry to leave you with that image.) No. You're not even a little bit sorry. You're an evil, evil man. And funny. ChrisA who knows that its codepoint is 1F4A9 without looking i

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-07 Thread Peter Pearson
On Thu, 07 Apr 2016 11:37:50 +1000, Steven D'Aprano wrote: > On Thu, 7 Apr 2016 05:56 am, Thomas 'PointedEars' Lahn wrote: >> Rustom Mody wrote: > >>> So here are some examples to illustrate what I am saying: >>> >>> Example 1 -- Ligatures: >>> >>> Python3 gets it right >> flag = 1 >> flag

Re: Unicode normalisation [was Re: [beginner] What's wrong?]

2016-04-06 Thread Marko Rauhamaa
Steven D'Aprano : > So even in English, capitalisation can make a semantic difference. It can even make a pronunciation difference: polish vs Polish. Marko -- https://mail.python.org/mailman/listinfo/python-list

Re: Unicode failure

2015-12-07 Thread Oscar Benjamin
On Sun, 6 Dec 2015 at 23:11 Quivis wrote: > On Fri, 04 Dec 2015 13:07:38 -0500, D'Arcy J.M. Cain wrote: > > > I thought that going to Python 3.4 would solve my Unicode issues but it > > seems I still don't understand this stuff. Here is my script. > > > > #! /usr/bin/python3 # -*- coding: UTF-8

  1   2   3   4   5   6   7   8   9   10   >