Re: A few questiosn about encoding

2013-06-20 Thread wxjmfauth
Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit : > On 20/06/2013 07:26, Steven D'Aprano wrote: > > > On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote: > > > > > >> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote: > > >> > > >>> Gah! That's twice I've screwed that u

Re: A few questiosn about encoding

2013-06-23 Thread wxjmfauth
Le jeudi 20 juin 2013 19:17:12 UTC+2, MRAB a écrit : > On 20/06/2013 17:37, Chris Angelico wrote: > > > On Fri, Jun 21, 2013 at 2:27 AM, wrote: > > >> And all these coding schemes have something in common, > > >> they work all with a unique set of code points, more > > >> precisely a unique s

Re: A few questiosn about encoding

2013-06-25 Thread wxjmfauth
Le dimanche 23 juin 2013 18:30:40 UTC+2, Steven D'Aprano a écrit : > On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote: > > > > > utf-8: how many bytes to hold an "a" in memory? one byte. > > > > > > flexible string representation: how m

Re: io module and pdf question

2013-06-26 Thread wxjmfauth
Le mardi 25 juin 2013 06:18:44 UTC+2, jyou...@kc.rr.com a écrit : > Would like to get your opinion on this. Currently to get the metadata out of > a pdf file, I loop through the guts of the file. I know it's not the > greatest idea to do this, but I'm trying to avoid extra modules, etc. > > >

Re: hex dump w/ or w/out utf-8 chars

2013-07-09 Thread wxjmfauth
Le mardi 9 juillet 2013 09:00:02 UTC+2, Steven D'Aprano a écrit : > On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote: > > > > > Not using python 3, for me (a programmer which was present at the > > > beginning of computer science, badly interacting with many languages > > > from assembl

Re: hex dump w/ or w/out utf-8 chars

2013-07-10 Thread wxjmfauth
For those who are interested. The official proposal request for the encoding of the Latin uppercase letter Sharp S in ISO/IEC 10646; DIN (The German Institute for Standardization) proposal is available on the web. A pdf with the rationale. I do not remember from where I got it, probably from a Germ

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread wxjmfauth
Le lundi 8 juillet 2013 19:52:17 UTC+2, Chris Angelico a écrit : > On Tue, Jul 9, 2013 at 3:31 AM, wrote: > > > Unfortunately (as probably I told you before) I will never pass to > > > Python 3... Guido should not always listen only to gurus like him... > > > I don't like Python as before...s

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread wxjmfauth
Le jeudi 11 juillet 2013 20:42:26 UTC+2, wxjm...@gmail.com a écrit : > Le jeudi 11 juillet 2013 15:32:00 UTC+2, Chris Angelico a écrit : > > > On Thu, Jul 11, 2013 at 11:18 PM, wrote: > > > > > > > Just to stick with this funny character ẞ, a ucs-2 char > > > > > > > in the Flexible String

Re: hex dump w/ or w/out utf-8 chars

2013-07-11 Thread wxjmfauth
Le jeudi 11 juillet 2013 15:32:00 UTC+2, Chris Angelico a écrit : > On Thu, Jul 11, 2013 at 11:18 PM, wrote: > > > Just to stick with this funny character ẞ, a ucs-2 char > > > in the Flexible String Representation nomenclature. > > > > > > It seems to me that, when one needs more than ten by

Re: RE Module Performance

2013-07-12 Thread wxjmfauth
Le vendredi 12 juillet 2013 01:44:05 UTC+2, Devyn Collier Johnson a écrit : > I recently saw an email in this mailing list about the RE module being > > made slower. I no long have that email. However, I have viewed the > > source for the RE module, but I did not see any code that would slow >

Re: hex dump w/ or w/out utf-8 chars

2013-07-12 Thread wxjmfauth
Le vendredi 12 juillet 2013 05:18:44 UTC+2, Steven D'Aprano a écrit : > On Thu, 11 Jul 2013 11:42:26 -0700, wxjmfauth wrote: > > > Now all your strings will be just as heavy, every single variable name > > and attribute name will use four times as much memory. Happy n

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread wxjmfauth
Le vendredi 12 juillet 2013 04:16:21 UTC+2, Chris Angelico a écrit : > On Fri, Jul 12, 2013 at 4:42 AM, wrote: > > > BTW, since > > > when a serious coding scheme need an extermal marker? > > > > > > > All of them. > > > > Content-type: text/plain; charset=UTF-8 > > > > ChrisA --

Re: hex dump w/ or w/out utf-8 chars

2013-07-13 Thread wxjmfauth
Le samedi 13 juillet 2013 11:49:10 UTC+2, Steven D'Aprano a écrit : > On Sat, 13 Jul 2013 00:56:52 -0700, wxjmfauth wrote: > > > > > You are confusing the knowledge of a coding scheme and the intrisinc > > > information a "coding scheme" *may* have, i

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread wxjmfauth
Le samedi 13 juillet 2013 21:02:24 UTC+2, Dave Angel a écrit : > On 07/13/2013 10:37 AM, wxjmfa...@gmail.com wrote: > > > > > > Fortunately for us, Python (in version 3.3 and later) and Pike did it > > right. Some day the others may decide to do similarly. > > > --- Possible but

Re: hex dump w/ or w/out utf-8 chars

2013-07-14 Thread wxjmfauth
Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit : > On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote: > > > > > For a very simple reason, the latin-1 block: considered and accepted > > > today as beeing a Unicode design mistake. > >

Re: help on python regular expression named group

2013-07-16 Thread wxjmfauth
Le mardi 16 juillet 2013 08:55:58 UTC+2, Mohan L a écrit : > Dear All, > > > > Here is my script : > > > > #!/usr/bin/python > > > import re > > > > > # A string. > logs = "date=2012-11-28 time=21:14:59" > > > > # Match with named groups. > m = > re.match("(?P(date=(?P[^\s]+))\s+(ti

Re: help on python regular expression named group

2013-07-17 Thread wxjmfauth
Le mercredi 17 juillet 2013 09:46:46 UTC+2, Joshua Landau a écrit : > On 17 July 2013 07:15, wrote: > > > Not sure, I'm correct. I took you precise string to > > > refresh my memory. > > > > I'm glad to see you doing something else, but I don't think you > > understood his problem. Note tha

Re: Stack Overflow moderator “animuson”

2013-07-19 Thread wxjmfauth
Le mercredi 10 juillet 2013 11:00:23 UTC+2, Steven D'Aprano a écrit : > On Wed, 10 Jul 2013 07:55:05 +, Mats Peterson wrote: > > > > > A moderator who calls himself “animuson” on Stack Overflow doesn’t want > > > to face the truth. He has deleted all my postings regarding Python > > > regu

Re: hex dump w/ or w/out utf-8 chars

2013-07-24 Thread wxjmfauth
I do not find the thread, where a Python core dev spoke about French, so I'm putting here. This stupid Flexible String Representation splits Unicode in chunks and one of these chunks is latin-1 (iso-8859-1). If we consider that latin-1 is unusable for 17 (seventeen) European languages based on th

Re: RE Module Performance

2013-07-24 Thread wxjmfauth
Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit : > On 07/12/2013 09:59 AM, Joshua Landau wrote: > > > If you're interested, the basic of it is that strings now use a > > > variable number of bytes to encode their values depending on whether > > > values outside of the ASCII ran

Re: RE Module Performance

2013-07-25 Thread wxjmfauth
Le mercredi 24 juillet 2013 16:47:36 UTC+2, Michael Torrie a écrit : > On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote: > > > Sorry, you are not understanding Unicode. What is a Unicode > > > Transformation Format (UTF), what is the goal of a UTF and > > > why it is important for an implementa

Re: RE Module Performance

2013-07-25 Thread wxjmfauth
Le jeudi 25 juillet 2013 12:14:46 UTC+2, Chris Angelico a écrit : > On Thu, Jul 25, 2013 at 7:27 PM, wrote: > > > A coding scheme works with a unique set of characters (the repertoire), > > > and the implementation (the programming) works with a unique set > > > of encoded code points. The cri

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano > > wrote: > > > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: > > > > > >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano > > >> wrote: > > >>> On Thu, 25 Jul 2013 14:36

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le vendredi 26 juillet 2013 05:09:34 UTC+2, Michael Torrie a écrit : > On 07/25/2013 11:18 AM, Steven D'Aprano wrote: > > > JMF has explained that it is impossible, impossible I say!, to write an > > > editor using a flexible string representation. Since Emacs uses such a > > > flexible string

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano > > wrote: > > > UTF-8 uses a flexible representation on a character-by-character basis. > > > When parsing UTF-8, one needs to look at EVERY character to decide how > > > many bytes yo

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano > > wrote: > > > UTF-8 uses a flexible representation on a character-by-character basis. > > > When parsing UTF-8, one needs to look at EVERY character to decide how > > > many bytes yo

Re: RE Module Performance

2013-07-27 Thread wxjmfauth
Le samedi 27 juillet 2013 04:05:03 UTC+2, Michael Torrie a écrit : > On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote: > > sys.getsizeof('––') - sys.getsizeof('–') > > > > > > I have already explained / commented this. > > > > Maybe it got lost in translation, but I don't understand yo

Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : > On Sat, Jul 27, 2013 at 12:21 PM, wrote: > > > Back to utf. utfs are not only elements of a unique set of encoded > > > code points. They have an interesting feature. Each "utf chunk" > > > holds intrisically the character (in fact th

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit : > On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: > > > Good point. FSR, nice tool for those who wish to teach > > > Unicode. It is not every day, one has such an opportunity. > > > > I had a long e-mail composed, but deci

Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit : > On 28/07/2013 19:13, wxjmfa...@gmail.com wrote: > > > Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : > > >> On Sat, Jul 27, 2013 at 12:21 PM, wrote: > > >> > > >> > Back to utf. utfs are not only elements of a unique set

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote: > > > > > Do not forget that à la "FSR" mechanism for a non-ascii user is > > > *irrelevant*. > > > > You have

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 12:43 PM, wrote: > > > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > > > 3.2 > > timeit.timeit("r = dir(list)") > > > 22.300465007102908 > > > > > > 3.3 > > timei

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le dimanche 28 juillet 2013 19:36:00 UTC+2, Terry Reedy a écrit : > On 7/28/2013 11:52 AM, Michael Torrie wrote: > > > > > > 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that > > > slicing a string would be very very slow, > > > > Not necessarily so. See below. > > >

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 12:43 PM, wrote: > > > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > > > 3.2 > > timeit.timeit("r = dir(list)") > > > 22.300465007102908 > > > > > > 3.3 > > timei

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le lundi 29 juillet 2013 16:49:34 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 3:20 PM, wrote: > > >>c:\python32\pythonw -u "timitmod.py" > > > 15.258061416225663 > > >>Exit code: 0 > > >>c:\Python33\pythonw -u "timitmod.py" > > > 17.052203122286194 > > >>Exit code: 0 > > >

Re: RE Module Performance

2013-07-30 Thread wxjmfauth
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : > On Sat, Jul 27, 2013 at 12:21 PM, wrote: > > > Back to utf. utfs are not only elements of a unique set of encoded > > > code points. They have an interesting feature. Each "utf chunk" > > > holds intrisically the character (in fact th

Re: RE Module Performance

2013-07-30 Thread wxjmfauth
Matable, immutable, copyint + xxx, bufferint, O(n) Yes, but conceptualy the reencoding happen sometime, somewhere. The internal "ucs-2" will never automagically be transformed into "ucs-4" (eg). >>> timeit.timeit("'a'*1 +'€'") 7.087220684719967 >>> timeit.timeit("'a'*1 +'z'") 1.568521

Re: RE Module Performance

2013-07-31 Thread wxjmfauth
FSR: === The 'a' in 'a€' and 'a\U0001d11e: >>> ['{:#010b}'.format(c) for c in 'a€'.encode('utf-16-be')] ['0b', '0b0111', '0b0010', '0b10101100'] >>> ['{:#010b}'.format(c) for c in 'a\U0001d11e'.encode('utf-32-be')] ['0b', '0b', '0b', '0b0111', '0b00

Py330b1, un café crème sucré

2012-06-27 Thread wxjmfauth
# -*- coding: cp1252 -*- # café.py import sys print(sys.version) sys.path.append('d:\\crème') import crème import sucré s = ' '.join(['un', 'café', crème.tag, sucré.tag]) print(s) input(':') #-- # .\sucré.py: # -*- coding: cp1252 -*- #tag = 'sucré' #-- # d:\crème\crème.py # -*- coding

Re: Why has python3 been created as a seperate language where there is still python2.7 ?

2012-06-28 Thread wxjmfauth
On Thursday, June 28, 2012 7:47:24 AM UTC+2, Stefan Behnel wrote: > Serhiy Storchaka, 28.06.2012 07:36: > > On 28.06.12 00:14, Terry Reedy wrote: > >> Another prediction: people who code Python without reading the manual, > >> at least not for new features, will learn about 'u' somehow (such as by

Re: Python 2.6 StreamReader.readline()

2012-07-25 Thread wxjmfauth
On Wednesday, July 25, 2012 11:02:01 AM UTC+2, Walter Dörwald wrote: > On 25.07.12 08:09, Ulrich Eckhardt wrote: > > > Am 24.07.2012 17:01, schrieb cpppw...@gmail.com: > >> reader = codecs.getreader(encoding) > >> lines = [] > >> with open(filename, 'rb') as f: > >> lines

Re: catch UnicodeDecodeError

2012-07-26 Thread wxjmfauth
On Thursday, July 26, 2012 9:46:27 AM UTC+2, Jaroslav Dobrek wrote: > On Jul 25, 8:50 pm, Dave Angel wrote: > > On 07/25/2012 08:09 AM, jaroslav.dob...@gmail.com wrote: > > > > > > > > > > > > > > > > > > > > > On Wednesday, July 25, 2012 1:35:09 PM UTC+2, Philipp Hagemeister > w

Re: OT: Text editors (was Re: Search and replace text in XML file?)

2012-07-28 Thread wxjmfauth
On Saturday, July 28, 2012 5:51:48 PM UTC+2, Chris Angelico wrote: ... and has a few limitations (eg it only really supports > > UTF-8), ?! It's my daily plain text editor (Windows) since ? (I don't remember). And I'm using it for utf-8, utf-16 and cp1252 (my favorite coding) without problems.

Re: OT: Text editors (was Re: Search and replace text in XML file?)

2012-07-28 Thread wxjmfauth
On Saturday, July 28, 2012 7:47:24 PM UTC+2, Chris Angelico wrote: > On Sun, Jul 29, 2012 at 3:43 AM, wrote: > > > On Saturday, July 28, 2012 5:51:48 PM UTC+2, Chris Angelico wrote: > > > > > > ... and has a few limitations (eg it only really supports > > >> > > >> UTF-8), > > > > > > ?! >

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread wxjmfauth
Le vendredi 17 août 2012 01:59:31 UTC+2, Terry Reedy a écrit : > a = '…' > > print(ord(a)) > > >>> > > 8230 > > Most things with unicode are easier in 3.x, and some are even better in > > 3.3. The current beta is good enough for most informal work. 3.3.0 will > > be out in a month. > > >

Re: How do I display unicode value stored in a string variable using ord()

2012-08-17 Thread wxjmfauth
Le vendredi 17 août 2012 20:21:34 UTC+2, Jerry Hill a écrit : > On Fri, Aug 17, 2012 at 1:49 PM, wrote: > > > The character '…', Unicode name 'HORIZONTAL ELLIPSIS', > > > is one of these characters existing in the cp1252, mac-roman > > > coding schemes and not in iso-8859-1 (latin-1) and obvio

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
>>> sys.version '3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]' >>> timeit.timeit("('ab…' * 1000).replace('…', '……')") 37.32762490493721 timeit.timeit("('ab…' * 10).replace('…', 'œ…')") 0.8158757139801764 >>> sys.version '3.3.0b2 (v3.3.0b2:4972a8f1b2aa, Aug 12 2012, 15:02:36)

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : > [...] > The problem with UCS-4 is that every character requires four bytes. > [...] I'm aware of this (and all the blah blah blah you are explaining). This always the same song. Memory. Let me ask. Is Python an 'american" product

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Sorry guys, I'm not stupid (I think). I can open IDLE with Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is always slower. Period. Now, the reason. I think it is due the "flexible represention". Deeper reason. The "boss" do not wish to hear from a (pure) ucs-4/utf-32 "engine" (this h

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit : > > Proof that is acceptable to everybody please, not just yourself. > > I cann't, I'm only facing the fact it works slower on my Windows platform. As I understand (I think) the undelying mechanism, I can only say, it is not a surpr

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 19:59:18 UTC+2, Steven D'Aprano a écrit : > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > > > > > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit : > > >> [...] > > >> The problem wi

Re: How do I display unicode value stored in a string variable using ord()

2012-08-18 Thread wxjmfauth
Le samedi 18 août 2012 20:40:23 UTC+2, rusi a écrit : > On Aug 18, 10:59 pm, Steven D'Aprano > +comp.lang.pyt...@pearwood.info> wrote: > > > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote: > > > > Is there any reason why non ascii users are somehow p

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
About the exemples contested by Steven: eg: timeit.timeit("('ab…' * 10).replace('…', 'œ…')") And it is good enough to show the problem. Period. The rest (you have to do this, you should not do this, why are you using these characters - amazing and stupid question -) does not count. The real pro

Re: New internal string format in 3.3, was Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 10:56:36 UTC+2, Steven D'Aprano a écrit : > > internal implementation, and strings which fit exactly in Latin-1 will > And this is the crucial point. latin-1 is an obsolete and non usable coding scheme (esp. for european languages). We fall on the point I mentionned ab

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 11:37:09 UTC+2, Peter Otten a écrit : You know, the techincal aspect is one thing. Understanding the coding of the characters as a whole is something else. The important point is not the coding per se, the relevant point is the set of characters a coding may represent. Y

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit : > On Sun, Aug 19, 2012 at 8:19 PM, wrote: > > > This is precicely the weak point of this flexible > > > representation. It uses latin-1 and latin-1 is for > > > most users simply unusable. > > > > No, it uses Unicode, and as

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit : > On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: > > > Le dimanche 19 ao�t 2012 12:26:44 UTC+2, Chris Angelico a �crit : > > >> On Sun, Aug 19, 2012 at 8:19 PM, wrote: > > >> > > >>> This is precicely the weak point of this

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 15:46:34 UTC+2, Mark Lawrence a écrit : > On 19/08/2012 13:59, wxjmfa...@gmail.com wrote: > > > Le dimanche 19 ao�t 2012 14:29:17 UTC+2, Dave Angel a �crit : > > >> On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote: > > >> > > >>> Le dimanche 19 ao�t 2012 12:26:44

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 16:48:48 UTC+2, Mark Lawrence a écrit : > On 19/08/2012 15:09, wxjmfa...@gmail.com wrote: > > > > > > > > I can not give you more numbers than those I gave. > > > As a end user, I noticed and experimented my random tests > > > are always slower in Py3.3 than in Py3.2

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit : > "Steven D'Aprano" wrote in message > > news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com... > > > > On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote: > > > > [...]

Re: New internal string format in 3.3

2012-08-19 Thread wxjmfauth
Just for the story. Five minutes after a closed my interactive interpreters windows, the day I tested this stuff. I though: "Too bad I did not noted the extremely bad cases I found, I'm pretty sure, this problem will arrive on the table". jmf -- http://mail.python.org/mailman/listinfo/python-li

Re: How do I display unicode value stored in a string variable using ord()

2012-08-19 Thread wxjmfauth
Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit : > > > But they are not ascii pages, they are (as stated) MOSTLY ascii. > > E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses > > a much more memory-expensive encoding than UTF-8. > > Imagine an us banking applicat

Re: Abuse of Big Oh notation

2012-08-20 Thread wxjmfauth
By chance and luckily, first attempt. IDLE, Windows 7.0 Pro 32, Pentium Dual Core 2.6, RAM 2 Go Py 3.2.3 >>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')") [1.6939567134893707, 1.672874290786993, 1.6761219212298073] Py 3.3.0b2 >>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')") [7.924

Re: Abuse of subject, was Re: Abuse of Big Oh notation

2012-08-21 Thread wxjmfauth
Le mardi 21 août 2012 09:52:09 UTC+2, Peter Otten a écrit : > wxjmfa...@gmail.com wrote: > > > > > By chance and luckily, first attempt. > > > > > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€' > > > , 'œ')" > > > 100 loops, best of 3: 1.48 usec per loop > > > c:\python33

Flexible string representation, unicode, typography, ...

2012-08-23 Thread wxjmfauth
This is neither a complaint nor a question, just a comment. In the previous discussion related to the flexible string representation, Roy Smith added this comment: http://groups.google.com/group/comp.lang.python/browse_thread/thread/2645504f459bab50/eda342573381ff42 Not only I agree with his sen

Re: Flexible string representation, unicode, typography, ...

2012-08-23 Thread wxjmfauth
Le jeudi 23 août 2012 15:57:50 UTC+2, Neil Hodgson a écrit : > wxjmfa...@gmail.com: > > > > > Small illustration. Take an a4 page containing 50 lines of 80 ascii > > > characters, add a single 'EM DASH' or an 'BULLET' (code points> 0x2000), > > > and you will see all the optimization efforts

Re: Flexible string representation, unicode, typography, ...

2012-08-25 Thread wxjmfauth
Le samedi 25 août 2012 02:24:35 UTC+2, Antoine Pitrou a écrit : > Ramchandra Apte gmail.com> writes: > > > > > > The zen of python is simply a guideline > > > > What's more, the Zen guides the language's design, not its implementation. > > People who think CPython is a complicated implement

Re: Flexible string representation, unicode, typography, ...

2012-08-25 Thread wxjmfauth
Le samedi 25 août 2012 11:46:34 UTC+2, Frank Millman a écrit : > On 25/08/2012 10:58, Mark Lawrence wrote: > > > On 25/08/2012 08:27, wxjmfa...@gmail.com wrote: > > >> > > >> Unicode design: a flat table of code points, where all code > > >> points are "equals". > > >> As soon as one attempts

Re: Flexible string representation, unicode, typography, ...

2012-08-26 Thread wxjmfauth
Le dimanche 26 août 2012 00:26:56 UTC+2, Ian a écrit : > On Sat, Aug 25, 2012 at 9:47 AM, wrote: > > > For those you do not know, the go language has introduced > > > the rune type. As far as I know, nobody is complaining, I > > > have not even seen a discussion related to this subject. > >

Re: Flexible string representation, unicode, typography, ...

2012-08-27 Thread wxjmfauth
Le dimanche 26 août 2012 22:45:09 UTC+2, Dan Sommers a écrit : > On 2012-08-26 at 20:13:21 +, > > Steven D'Aprano wrote: > > > > > I note that not all 32-bit ints are valid code points. I suppose I can > > > see sense in having rune be a 32-bit integer value limited to those > > > valid

Re: Flexible string representation, unicode, typography, ...

2012-08-27 Thread wxjmfauth
Le lundi 27 août 2012 22:14:07 UTC+2, Ian a écrit : > On Mon, Aug 27, 2012 at 1:16 PM, wrote: > > > - Why int32 and not uint32? No idea, I tried to find an > > > answer without asking. > > > > UCS-4 is technically only a 31-bit encoding. The sign bit is not used, > > so the choice of int32

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread wxjmfauth
Le lundi 27 août 2012 22:37:03 UTC+2, (inconnu) a écrit : > Le lundi 27 août 2012 22:14:07 UTC+2, Ian a écrit : > > > On Mon, Aug 27, 2012 at 1:16 PM, wrote: > > > > > > > - Why int32 and not uint32? No idea, I tried to find an > > > > > > > answer without asking. > > > > > > > > > >

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread wxjmfauth
Le mercredi 29 août 2012 06:16:05 UTC+2, Ian a écrit : > On Tue, Aug 28, 2012 at 8:42 PM, rusi wrote: > > > In summary: > > > 1. The problem is not on jmf's computer > > > 2. It is not windows-only > > > 3. It is not directly related to latin-1 encodable or not > > > > > > The only question

Re: Flexible string representation, unicode, typography, ...

2012-08-29 Thread wxjmfauth
Le mercredi 29 août 2012 14:01:57 UTC+2, Dave Angel a écrit : > On 08/29/2012 07:40 AM, wxjmfa...@gmail.com wrote: > > > > > > > > Forget Python and all these benchmarks. The problem is on an other > > > level. Coding schemes, typography, usage of characters, ... For a > > > given coding sch

Re: Flexible string representation, unicode, typography, ...

2012-08-30 Thread wxjmfauth
Le jeudi 30 août 2012 08:55:01 UTC+2, Steven D'Aprano a écrit : You are right. But as soon as you introduce artificially a "latin-1" bottleneck, all this machinery just become useless. This flexible representation is working absurdly. It optimizes the characters you are not using (in one sense)

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread wxjmfauth
Le jeudi 30 août 2012 17:01:50 UTC+2, Antoine Pitrou a écrit : > > > I honestly suggest you shut up until you have a clue. > Désolé Antoine, I have not the knowledge to dive in the Python code, but I know what is a character. The coding of the characters is a domain per se, independent from th

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread wxjmfauth
Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit : > On Sun, Sep 2, 2012 at 1:36 AM, wrote: > > > I still remember my thoughts when I read the PEP 393 > > > discussion: "this is not logical", "they do no understand > > > typography", "atomic character ???", ... > > > > That would in

Re: Flexible string representation, unicode, typography, ...

2012-09-02 Thread wxjmfauth
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit : > On 02.09.12 12:52, Peter Otten wrote: > > > Ian Kelly wrote: > > > > > >> Rewriting the example to use locale.strcoll instead: > > > > > > sorted(li, key=functools.cmp_to_key(locale.strcoll)) > > > > > > There is a

Re: Least-lossy string.encode to us-ascii?

2012-09-14 Thread wxjmfauth
Le jeudi 13 septembre 2012 23:25:27 UTC+2, Tim Chase a écrit : > I've got a bunch of text in Portuguese and to transmit them, need to > > have them in us-ascii (7-bit). I'd like to keep as much information > > as possible, just stripping accents, cedillas, tildes, etc. So > > "serviço móvil" b

Re: Least-lossy string.encode to us-ascii?

2012-09-15 Thread wxjmfauth
Le vendredi 14 septembre 2012 22:45:05 UTC+2, Terry Reedy a écrit : > On 9/14/2012 12:15 PM, wxjmfa...@gmail.com wrote: > > > > > PS Avoid Py3.3 :-) > > > > pps Start using 3.3 as soon as possible. It has Python's first fully > > portable non-buggy Unicode implementation. The second releas

Re: reportlab and python 3

2012-09-18 Thread wxjmfauth
Le lundi 17 septembre 2012 10:48:30 UTC+2, Laszlo Nagy a écrit : > Reportlab is on the wall of shame. http://python3wos.appspot.com/ > > > > Is there other ways to create PDF files from python 3? There is pyPdf. I > > haven't tried it yet, but it seem that it is a low level library. It > > d

Re: reportlab and python 3

2012-09-18 Thread wxjmfauth
Le mardi 18 septembre 2012 11:04:19 UTC+2, Laszlo Nagy a écrit : > > A big yes and it is very easy. I assume you know how > > > to write a plain text file with Python :-). > > > > > > Use your Python to generate a .tex file and let it compile > > > with one of the pdf TeX engines. > > > > > >

Re: reportlab and python 3

2012-09-18 Thread wxjmfauth
Le mardi 18 septembre 2012 15:31:52 UTC+2, Laszlo Nagy a écrit : > > I understood, you have Python on a platform and starting > > > from this you wish to create pdf files. > > > Obviously, embedding "TeX" is practically a no solution, > > > although distibuting a portable standalone TeX distribu

Re: For Counter Variable

2012-09-25 Thread wxjmfauth
I wrote my first program on a PDP-8. I discovered Python at release 1.5.? Now years later... I find Python more and more unusable. As an exemple related to this topic, which summarizes a little bit the situation. I just opened my interactive interpreter and produced this: >>> for i in range(len(

Re: Article on the future of Python

2012-09-25 Thread wxjmfauth
Le mercredi 26 septembre 2012 01:34:01 UTC+2, 8 Dihedral a écrit : > Grant Edwards於 2012年9月26日星期三UTC+8上午2時25分31秒寫道: > > > On 2012-09-25, Martin P. Hellwig wrote: > > > > > > > On Tuesday, 25 September 2012 09:14:27 UTC+1, Mark Lawrence wrote: > > > > > > >> Hi all, > > > > > > >> >

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 09:23:47 UTC+2, Steven D'Aprano a écrit : > On Tue, 25 Sep 2012 23:35:39 -0700, wxjmfauth wrote: > > > > > Py 3.3 succeeded to somehow kill unicode and it has been transformed > > > into an "American" product for "Ame

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 10:35:04 UTC+2, Mark Lawrence a écrit : > On 26/09/2012 07:35, wxjmfa...@gmail.com wrote: > > > > > > Py 3.3 succeeded to somehow kill unicode and it has > > > been transformed into an "American" product for > > > "American" users. > > > jmf > > > > > > > Why

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 10:13:58 UTC+2, Terry Reedy a écrit : > On 9/26/2012 2:35 AM, wxjmfa...@gmail.com wrote: > > > > > Py 3.3 succeeded to somehow kill unicode and it has > > > been transformed into an "American" product for > > > "American" users. > > > > Python 3.3 is the first

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 11:55:16 UTC+2, Chris Angelico a écrit : > On Wed, Sep 26, 2012 at 7:31 PM, wrote: > > > you are correct. But the price you pay for this is extremely > > > high. Now, practically all characters are affected, espacially > > > those *in* the Basic *** Multilingual**

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
I should add that I have not the knowledge to dive in the Python code. But I "see" what has been done. As I have a very good understanding of all this coding of characters stuff, I can just pick up - in fact select characters or combination of characters - which I supspect to be problematic and I s

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 16:56:55 UTC+2, Chris Angelico a écrit : > On Thu, Sep 27, 2012 at 12:50 AM, wrote: > > > I just see the results and the facts. For an end > > > user, this is the only thing that counts. > > > > Then what counts is that Python 3.2 (like Javascript) exhibits > >

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Sorry guys, I'm "only" able to see this (with the Python versions an end user can download): >>> timeit.repeat("('你'*1).replace('你', 'a')") [31.44532887821319, 31.409585124813844, 31.40705548932476] >>> timeit.repeat("('你'*1).replace('你', 'a')") [323.56687741054805, 323.1660997337247, 325

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 17:54:04 UTC+2, Ian a écrit : > On Wed, Sep 26, 2012 at 1:23 AM, Steven D'Aprano > > wrote: > > > On Tue, 25 Sep 2012 23:35:39 -0700, wxjmfauth wrote: > > > > > >> Py 3.3 succeeded to somehow kill unicode and it ha

Re: Article on the future of Python

2012-09-26 Thread wxjmfauth
Le mercredi 26 septembre 2012 18:52:44 UTC+2, Paul Rubin a écrit : > Chris Angelico writes: > > > When you compare against a wide build, semantics of 3.2 and 3.3 are > > > identical, and then - and ONLY then - can you sanely compare > > > performance. And 3.3 stacks up much better. > > > > I

Re: new-style class or old-style class?

2012-09-26 Thread wxjmfauth
Le mardi 25 septembre 2012 16:44:05 UTC+2, Jayden a écrit : > In learning Python, I found there are two types of classes? Which one are > widely used in new Python code? Is the new-style much better than old-style? > Thanks!! Use Python 3 and classes. --- The interesting point or my ques

Re: Article on the future of Python

2012-09-27 Thread wxjmfauth
This flexible string representation is wrong by design. Expecting to divide "Unicode" in chunks and to gain something is an illusion. It has been created by a computer scientist who thinks "bytes" when on that field one has to think "bytes" and usage of the characters at the same time. The latin-1

Re: Coexistence of Python 2.x and 3.x on same OS

2012-10-06 Thread wxjmfauth
Using Python on Windows is a dream. Python uses and needs the system, but the system does not use Python. Every Python version is installed in its own isolated space, site-packages included and without any defined environment variable. Every Python can be seen as a different application. Knowing

Re: for-loop on cmd-line

2012-10-11 Thread wxjmfauth
Le jeudi 11 octobre 2012 15:16:33 UTC+2, Ramchandra Apte a écrit : PS C:\> $cmd="import sys;" PS C:\> $cmd+="print('\n'.join(sys.path))" PS C:\> $cmd import sys;print('\n'.join(sys.path)) PS C:\> c:\python32\python -c $cmd C:\Windows\system32\python32.zip c:\python32\DLLs c:\python32\lib c:\pytho

Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

2012-10-17 Thread wxjmfauth
Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit : > On 10/17/2012 10:31 AM, nwaits wrote: > > > I'm very impressed with python's wordlist script for plain text. Is there > > a script for finding words that do NOT have certain diacritic marks, like > > acute or grave accents (utf-

Re: Script for finding words of any size that do NOT contain vowels with acute diacritic marks?

2012-10-17 Thread wxjmfauth
Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit : > On Wed, Oct 17, 2012 at 9:32 AM, wrote: > > import unicodedata > > def HasDiacritics(w): > > > ... w_decomposed = unicodedata.normalize('NFKD', w) > > > ... return 'no' if len(w) == len(w_decomposed) else 'yes' > >

  1   2   3   4   >