Le jeudi 20 juin 2013 13:43:28 UTC+2, MRAB a écrit :
> On 20/06/2013 07:26, Steven D'Aprano wrote:
>
> > On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:
>
> >
>
> >> On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:
>
> >>
>
> >>> Gah! That's twice I've screwed that u
Le jeudi 20 juin 2013 19:17:12 UTC+2, MRAB a écrit :
> On 20/06/2013 17:37, Chris Angelico wrote:
>
> > On Fri, Jun 21, 2013 at 2:27 AM, wrote:
>
> >> And all these coding schemes have something in common,
>
> >> they work all with a unique set of code points, more
>
> >> precisely a unique s
Le dimanche 23 juin 2013 18:30:40 UTC+2, Steven D'Aprano a écrit :
> On Sun, 23 Jun 2013 08:51:41 -0700, wxjmfauth wrote:
>
>
>
> > utf-8: how many bytes to hold an "a" in memory? one byte.
>
> >
>
> > flexible string representation: how m
Le mardi 25 juin 2013 06:18:44 UTC+2, jyou...@kc.rr.com a écrit :
> Would like to get your opinion on this. Currently to get the metadata out of
> a pdf file, I loop through the guts of the file. I know it's not the
> greatest idea to do this, but I'm trying to avoid extra modules, etc.
>
>
>
Le mardi 9 juillet 2013 09:00:02 UTC+2, Steven D'Aprano a écrit :
> On Mon, 08 Jul 2013 10:53:18 -0700, ferdy.blatsco wrote:
>
>
>
> > Not using python 3, for me (a programmer which was present at the
>
> > beginning of computer science, badly interacting with many languages
>
> > from assembl
For those who are interested. The official proposal request
for the encoding of the Latin uppercase letter Sharp S in
ISO/IEC 10646; DIN (The German Institute for Standardization)
proposal is available on the web. A pdf with the rationale.
I do not remember from where I got it, probably from a Germ
Le lundi 8 juillet 2013 19:52:17 UTC+2, Chris Angelico a écrit :
> On Tue, Jul 9, 2013 at 3:31 AM, wrote:
>
> > Unfortunately (as probably I told you before) I will never pass to
>
> > Python 3... Guido should not always listen only to gurus like him...
>
> > I don't like Python as before...s
Le jeudi 11 juillet 2013 20:42:26 UTC+2, wxjm...@gmail.com a écrit :
> Le jeudi 11 juillet 2013 15:32:00 UTC+2, Chris Angelico a écrit :
>
> > On Thu, Jul 11, 2013 at 11:18 PM, wrote:
>
> >
>
> > > Just to stick with this funny character ẞ, a ucs-2 char
>
> >
>
> > > in the Flexible String
Le jeudi 11 juillet 2013 15:32:00 UTC+2, Chris Angelico a écrit :
> On Thu, Jul 11, 2013 at 11:18 PM, wrote:
>
> > Just to stick with this funny character ẞ, a ucs-2 char
>
> > in the Flexible String Representation nomenclature.
>
> >
>
> > It seems to me that, when one needs more than ten by
Le vendredi 12 juillet 2013 01:44:05 UTC+2, Devyn Collier Johnson a écrit :
> I recently saw an email in this mailing list about the RE module being
>
> made slower. I no long have that email. However, I have viewed the
>
> source for the RE module, but I did not see any code that would slow
>
Le vendredi 12 juillet 2013 05:18:44 UTC+2, Steven D'Aprano a écrit :
> On Thu, 11 Jul 2013 11:42:26 -0700, wxjmfauth wrote:
>
>
> Now all your strings will be just as heavy, every single variable name
>
> and attribute name will use four times as much memory. Happy n
Le vendredi 12 juillet 2013 04:16:21 UTC+2, Chris Angelico a écrit :
> On Fri, Jul 12, 2013 at 4:42 AM, wrote:
>
> > BTW, since
>
> > when a serious coding scheme need an extermal marker?
>
> >
>
>
>
> All of them.
>
>
>
> Content-type: text/plain; charset=UTF-8
>
>
>
> ChrisA
--
Le samedi 13 juillet 2013 11:49:10 UTC+2, Steven D'Aprano a écrit :
> On Sat, 13 Jul 2013 00:56:52 -0700, wxjmfauth wrote:
>
>
>
> > You are confusing the knowledge of a coding scheme and the intrisinc
>
> > information a "coding scheme" *may* have, i
Le samedi 13 juillet 2013 21:02:24 UTC+2, Dave Angel a écrit :
> On 07/13/2013 10:37 AM, wxjmfa...@gmail.com wrote:
>
>
>
>
>
> Fortunately for us, Python (in version 3.3 and later) and Pike did it
>
> right. Some day the others may decide to do similarly.
>
>
>
---
Possible but
Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit :
> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:
>
>
>
> > For a very simple reason, the latin-1 block: considered and accepted
>
> > today as beeing a Unicode design mistake.
>
>
Le mardi 16 juillet 2013 08:55:58 UTC+2, Mohan L a écrit :
> Dear All,
>
>
>
> Here is my script :
>
>
>
> #!/usr/bin/python
>
>
> import re
>
>
>
>
> # A string.
> logs = "date=2012-11-28 time=21:14:59"
>
>
>
> # Match with named groups.
> m =
> re.match("(?P(date=(?P[^\s]+))\s+(ti
Le mercredi 17 juillet 2013 09:46:46 UTC+2, Joshua Landau a écrit :
> On 17 July 2013 07:15, wrote:
>
> > Not sure, I'm correct. I took you precise string to
>
> > refresh my memory.
>
>
>
> I'm glad to see you doing something else, but I don't think you
>
> understood his problem. Note tha
Le mercredi 10 juillet 2013 11:00:23 UTC+2, Steven D'Aprano a écrit :
> On Wed, 10 Jul 2013 07:55:05 +, Mats Peterson wrote:
>
>
>
> > A moderator who calls himself “animuson” on Stack Overflow doesn’t want
>
> > to face the truth. He has deleted all my postings regarding Python
>
> > regu
I do not find the thread, where a Python core dev spoke
about French, so I'm putting here.
This stupid Flexible String Representation splits Unicode
in chunks and one of these chunks is latin-1 (iso-8859-1).
If we consider that latin-1 is unusable for 17 (seventeen)
European languages based on th
Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit :
> On 07/12/2013 09:59 AM, Joshua Landau wrote:
>
> > If you're interested, the basic of it is that strings now use a
>
> > variable number of bytes to encode their values depending on whether
>
> > values outside of the ASCII ran
Le mercredi 24 juillet 2013 16:47:36 UTC+2, Michael Torrie a écrit :
> On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote:
>
> > Sorry, you are not understanding Unicode. What is a Unicode
>
> > Transformation Format (UTF), what is the goal of a UTF and
>
> > why it is important for an implementa
Le jeudi 25 juillet 2013 12:14:46 UTC+2, Chris Angelico a écrit :
> On Thu, Jul 25, 2013 at 7:27 PM, wrote:
>
> > A coding scheme works with a unique set of characters (the repertoire),
>
> > and the implementation (the programming) works with a unique set
>
> > of encoded code points. The cri
Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit :
> On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano
>
> wrote:
>
> > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
>
> >
>
> >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
>
> >> wrote:
>
> >>> On Thu, 25 Jul 2013 14:36
Le vendredi 26 juillet 2013 05:09:34 UTC+2, Michael Torrie a écrit :
> On 07/25/2013 11:18 AM, Steven D'Aprano wrote:
>
> > JMF has explained that it is impossible, impossible I say!, to write an
>
> > editor using a flexible string representation. Since Emacs uses such a
>
> > flexible string
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit :
> On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano
>
> wrote:
>
> > UTF-8 uses a flexible representation on a character-by-character basis.
>
> > When parsing UTF-8, one needs to look at EVERY character to decide how
>
> > many bytes yo
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit :
> On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano
>
> wrote:
>
> > UTF-8 uses a flexible representation on a character-by-character basis.
>
> > When parsing UTF-8, one needs to look at EVERY character to decide how
>
> > many bytes yo
Le samedi 27 juillet 2013 04:05:03 UTC+2, Michael Torrie a écrit :
> On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote:
>
> sys.getsizeof('––') - sys.getsizeof('–')
>
> >
>
> > I have already explained / commented this.
>
>
>
> Maybe it got lost in translation, but I don't understand yo
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
> On Sat, Jul 27, 2013 at 12:21 PM, wrote:
>
> > Back to utf. utfs are not only elements of a unique set of encoded
>
> > code points. They have an interesting feature. Each "utf chunk"
>
> > holds intrisically the character (in fact th
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :
> On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote:
>
> > Good point. FSR, nice tool for those who wish to teach
>
> > Unicode. It is not every day, one has such an opportunity.
>
>
>
> I had a long e-mail composed, but deci
Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit :
> On 28/07/2013 19:13, wxjmfa...@gmail.com wrote:
>
> > Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
>
> >> On Sat, Jul 27, 2013 at 12:21 PM, wrote:
>
> >>
>
> >> > Back to utf. utfs are not only elements of a unique set
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
> On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote:
>
>
>
> > Do not forget that à la "FSR" mechanism for a non-ascii user is
>
> > *irrelevant*.
>
>
>
> You have
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit :
> On Mon, Jul 29, 2013 at 12:43 PM, wrote:
>
> > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
>
> > 3.2
>
> timeit.timeit("r = dir(list)")
>
> > 22.300465007102908
>
> >
>
> > 3.3
>
> timei
Le dimanche 28 juillet 2013 19:36:00 UTC+2, Terry Reedy a écrit :
> On 7/28/2013 11:52 AM, Michael Torrie wrote:
>
> >
>
> > 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
>
> > slicing a string would be very very slow,
>
>
>
> Not necessarily so. See below.
>
>
>
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit :
> On Mon, Jul 29, 2013 at 12:43 PM, wrote:
>
> > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
>
> > 3.2
>
> timeit.timeit("r = dir(list)")
>
> > 22.300465007102908
>
> >
>
> > 3.3
>
> timei
Le lundi 29 juillet 2013 16:49:34 UTC+2, Chris Angelico a écrit :
> On Mon, Jul 29, 2013 at 3:20 PM, wrote:
>
> >>c:\python32\pythonw -u "timitmod.py"
>
> > 15.258061416225663
>
> >>Exit code: 0
>
> >>c:\Python33\pythonw -u "timitmod.py"
>
> > 17.052203122286194
>
> >>Exit code: 0
>
>
>
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
> On Sat, Jul 27, 2013 at 12:21 PM, wrote:
>
> > Back to utf. utfs are not only elements of a unique set of encoded
>
> > code points. They have an interesting feature. Each "utf chunk"
>
> > holds intrisically the character (in fact th
Matable, immutable, copyint + xxx, bufferint, O(n)
Yes, but conceptualy the reencoding happen sometime, somewhere.
The internal "ucs-2" will never automagically be transformed
into "ucs-4" (eg).
>>> timeit.timeit("'a'*1 +'€'")
7.087220684719967
>>> timeit.timeit("'a'*1 +'z'")
1.568521
FSR:
===
The 'a' in 'a€' and 'a\U0001d11e:
>>> ['{:#010b}'.format(c) for c in 'a€'.encode('utf-16-be')]
['0b', '0b0111', '0b0010', '0b10101100']
>>> ['{:#010b}'.format(c) for c in 'a\U0001d11e'.encode('utf-32-be')]
['0b', '0b', '0b', '0b0111',
'0b00
# -*- coding: cp1252 -*-
# café.py
import sys
print(sys.version)
sys.path.append('d:\\crème')
import crème
import sucré
s = ' '.join(['un', 'café', crème.tag, sucré.tag])
print(s)
input(':')
#--
# .\sucré.py:
# -*- coding: cp1252 -*-
#tag = 'sucré'
#--
# d:\crème\crème.py
# -*- coding
On Thursday, June 28, 2012 7:47:24 AM UTC+2, Stefan Behnel wrote:
> Serhiy Storchaka, 28.06.2012 07:36:
> > On 28.06.12 00:14, Terry Reedy wrote:
> >> Another prediction: people who code Python without reading the manual,
> >> at least not for new features, will learn about 'u' somehow (such as by
On Wednesday, July 25, 2012 11:02:01 AM UTC+2, Walter Dörwald wrote:
> On 25.07.12 08:09, Ulrich Eckhardt wrote:
>
> > Am 24.07.2012 17:01, schrieb cpppw...@gmail.com:
> >> reader = codecs.getreader(encoding)
> >> lines = []
> >> with open(filename, 'rb') as f:
> >> lines
On Thursday, July 26, 2012 9:46:27 AM UTC+2, Jaroslav Dobrek wrote:
> On Jul 25, 8:50 pm, Dave Angel wrote:
> > On 07/25/2012 08:09 AM, jaroslav.dob...@gmail.com wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > > On Wednesday, July 25, 2012 1:35:09 PM UTC+2, Philipp Hagemeister
> w
On Saturday, July 28, 2012 5:51:48 PM UTC+2, Chris Angelico wrote:
... and has a few limitations (eg it only really supports
>
> UTF-8),
?!
It's my daily plain text editor (Windows) since ? (I don't remember).
And I'm using it for utf-8, utf-16 and cp1252 (my favorite coding)
without problems.
On Saturday, July 28, 2012 7:47:24 PM UTC+2, Chris Angelico wrote:
> On Sun, Jul 29, 2012 at 3:43 AM, wrote:
>
> > On Saturday, July 28, 2012 5:51:48 PM UTC+2, Chris Angelico wrote:
>
> >
>
> > ... and has a few limitations (eg it only really supports
>
> >>
>
> >> UTF-8),
>
> >
>
> > ?!
>
Le vendredi 17 août 2012 01:59:31 UTC+2, Terry Reedy a écrit :
> a = '…'
>
> print(ord(a))
>
> >>>
>
> 8230
>
> Most things with unicode are easier in 3.x, and some are even better in
>
> 3.3. The current beta is good enough for most informal work. 3.3.0 will
>
> be out in a month.
>
>
>
Le vendredi 17 août 2012 20:21:34 UTC+2, Jerry Hill a écrit :
> On Fri, Aug 17, 2012 at 1:49 PM, wrote:
>
> > The character '…', Unicode name 'HORIZONTAL ELLIPSIS',
>
> > is one of these characters existing in the cp1252, mac-roman
>
> > coding schemes and not in iso-8859-1 (latin-1) and obvio
>>> sys.version
'3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]'
>>> timeit.timeit("('ab…' * 1000).replace('…', '……')")
37.32762490493721
timeit.timeit("('ab…' * 10).replace('…', 'œ…')")
0.8158757139801764
>>> sys.version
'3.3.0b2 (v3.3.0b2:4972a8f1b2aa, Aug 12 2012, 15:02:36)
Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
> [...]
> The problem with UCS-4 is that every character requires four bytes.
> [...]
I'm aware of this (and all the blah blah blah you are
explaining). This always the same song. Memory.
Let me ask. Is Python an 'american" product
Sorry guys, I'm not stupid (I think). I can open IDLE with
Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
always slower. Period.
Now, the reason. I think it is due the "flexible represention".
Deeper reason. The "boss" do not wish to hear from a (pure)
ucs-4/utf-32 "engine" (this h
Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit :
>
> Proof that is acceptable to everybody please, not just yourself.
>
>
I cann't, I'm only facing the fact it works slower on my
Windows platform.
As I understand (I think) the undelying mechanism, I
can only say, it is not a surpr
Le samedi 18 août 2012 19:59:18 UTC+2, Steven D'Aprano a écrit :
> On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
>
>
>
> > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
>
> >> [...]
>
> >> The problem wi
Le samedi 18 août 2012 20:40:23 UTC+2, rusi a écrit :
> On Aug 18, 10:59 pm, Steven D'Aprano
> +comp.lang.pyt...@pearwood.info> wrote:
>
> > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
>
> > > Is there any reason why non ascii users are somehow p
About the exemples contested by Steven:
eg: timeit.timeit("('ab…' * 10).replace('…', 'œ…')")
And it is good enough to show the problem. Period. The
rest (you have to do this, you should not do this, why
are you using these characters - amazing and stupid
question -) does not count.
The real pro
Le dimanche 19 août 2012 10:56:36 UTC+2, Steven D'Aprano a écrit :
>
> internal implementation, and strings which fit exactly in Latin-1 will
>
And this is the crucial point. latin-1 is an obsolete and non usable
coding scheme (esp. for european languages).
We fall on the point I mentionned ab
Le dimanche 19 août 2012 11:37:09 UTC+2, Peter Otten a écrit :
You know, the techincal aspect is one thing. Understanding
the coding of the characters as a whole is something
else. The important point is not the coding per se, the
relevant point is the set of characters a coding may
represent.
Y
Le dimanche 19 août 2012 12:26:44 UTC+2, Chris Angelico a écrit :
> On Sun, Aug 19, 2012 at 8:19 PM, wrote:
>
> > This is precicely the weak point of this flexible
>
> > representation. It uses latin-1 and latin-1 is for
>
> > most users simply unusable.
>
>
>
> No, it uses Unicode, and as
Le dimanche 19 août 2012 14:29:17 UTC+2, Dave Angel a écrit :
> On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote:
>
> > Le dimanche 19 ao�t 2012 12:26:44 UTC+2, Chris Angelico a �crit :
>
> >> On Sun, Aug 19, 2012 at 8:19 PM, wrote:
>
> >>
>
> >>> This is precicely the weak point of this
Le dimanche 19 août 2012 15:46:34 UTC+2, Mark Lawrence a écrit :
> On 19/08/2012 13:59, wxjmfa...@gmail.com wrote:
>
> > Le dimanche 19 ao�t 2012 14:29:17 UTC+2, Dave Angel a �crit :
>
> >> On 08/19/2012 08:14 AM, wxjmfa...@gmail.com wrote:
>
> >>
>
> >>> Le dimanche 19 ao�t 2012 12:26:44
Le dimanche 19 août 2012 16:48:48 UTC+2, Mark Lawrence a écrit :
> On 19/08/2012 15:09, wxjmfa...@gmail.com wrote:
>
>
>
> >
>
> > I can not give you more numbers than those I gave.
>
> > As a end user, I noticed and experimented my random tests
>
> > are always slower in Py3.3 than in Py3.2
Le dimanche 19 août 2012 19:03:34 UTC+2, Blind Anagram a écrit :
> "Steven D'Aprano" wrote in message
>
> news:502f8a2a$0$29978$c3e8da3$54964...@news.astraweb.com...
>
>
>
> On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
>
>
>
> [...]
Just for the story.
Five minutes after a closed my interactive interpreters windows,
the day I tested this stuff. I though:
"Too bad I did not noted the extremely bad cases I found, I'm pretty
sure, this problem will arrive on the table".
jmf
--
http://mail.python.org/mailman/listinfo/python-li
Le dimanche 19 août 2012 19:48:06 UTC+2, Paul Rubin a écrit :
>
>
> But they are not ascii pages, they are (as stated) MOSTLY ascii.
>
> E.g. the characters are 99% ascii but 1% non-ascii, so 393 chooses
>
> a much more memory-expensive encoding than UTF-8.
>
>
Imagine an us banking applicat
By chance and luckily, first attempt.
IDLE, Windows 7.0 Pro 32, Pentium Dual Core 2.6, RAM 2 Go
Py 3.2.3
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[1.6939567134893707, 1.672874290786993, 1.6761219212298073]
Py 3.3.0b2
>>> timeit.repeat("('€'*100+'€'*100).replace('€', 'œ')")
[7.924
Le mardi 21 août 2012 09:52:09 UTC+2, Peter Otten a écrit :
> wxjmfa...@gmail.com wrote:
>
>
>
> > By chance and luckily, first attempt.
>
>
>
> > c:\python32\python -m timeit "('€'*100+'€'*100).replace('€'
>
> > , 'œ')"
>
> > 100 loops, best of 3: 1.48 usec per loop
>
> > c:\python33
This is neither a complaint nor a question, just a comment.
In the previous discussion related to the flexible
string representation, Roy Smith added this comment:
http://groups.google.com/group/comp.lang.python/browse_thread/thread/2645504f459bab50/eda342573381ff42
Not only I agree with his sen
Le jeudi 23 août 2012 15:57:50 UTC+2, Neil Hodgson a écrit :
> wxjmfa...@gmail.com:
>
>
>
> > Small illustration. Take an a4 page containing 50 lines of 80 ascii
>
> > characters, add a single 'EM DASH' or an 'BULLET' (code points> 0x2000),
>
> > and you will see all the optimization efforts
Le samedi 25 août 2012 02:24:35 UTC+2, Antoine Pitrou a écrit :
> Ramchandra Apte gmail.com> writes:
>
> >
>
> > The zen of python is simply a guideline
>
>
>
> What's more, the Zen guides the language's design, not its implementation.
>
> People who think CPython is a complicated implement
Le samedi 25 août 2012 11:46:34 UTC+2, Frank Millman a écrit :
> On 25/08/2012 10:58, Mark Lawrence wrote:
>
> > On 25/08/2012 08:27, wxjmfa...@gmail.com wrote:
>
> >>
>
> >> Unicode design: a flat table of code points, where all code
>
> >> points are "equals".
>
> >> As soon as one attempts
Le dimanche 26 août 2012 00:26:56 UTC+2, Ian a écrit :
> On Sat, Aug 25, 2012 at 9:47 AM, wrote:
>
> > For those you do not know, the go language has introduced
>
> > the rune type. As far as I know, nobody is complaining, I
>
> > have not even seen a discussion related to this subject.
>
>
Le dimanche 26 août 2012 22:45:09 UTC+2, Dan Sommers a écrit :
> On 2012-08-26 at 20:13:21 +,
>
> Steven D'Aprano wrote:
>
>
>
> > I note that not all 32-bit ints are valid code points. I suppose I can
>
> > see sense in having rune be a 32-bit integer value limited to those
>
> > valid
Le lundi 27 août 2012 22:14:07 UTC+2, Ian a écrit :
> On Mon, Aug 27, 2012 at 1:16 PM, wrote:
>
> > - Why int32 and not uint32? No idea, I tried to find an
>
> > answer without asking.
>
>
>
> UCS-4 is technically only a 31-bit encoding. The sign bit is not used,
>
> so the choice of int32
Le lundi 27 août 2012 22:37:03 UTC+2, (inconnu) a écrit :
> Le lundi 27 août 2012 22:14:07 UTC+2, Ian a écrit :
>
> > On Mon, Aug 27, 2012 at 1:16 PM, wrote:
>
> >
>
> > > - Why int32 and not uint32? No idea, I tried to find an
>
> >
>
> > > answer without asking.
>
> >
>
> >
>
> >
>
Le mercredi 29 août 2012 06:16:05 UTC+2, Ian a écrit :
> On Tue, Aug 28, 2012 at 8:42 PM, rusi wrote:
>
> > In summary:
>
> > 1. The problem is not on jmf's computer
>
> > 2. It is not windows-only
>
> > 3. It is not directly related to latin-1 encodable or not
>
> >
>
> > The only question
Le mercredi 29 août 2012 14:01:57 UTC+2, Dave Angel a écrit :
> On 08/29/2012 07:40 AM, wxjmfa...@gmail.com wrote:
>
> >
>
>
>
> > Forget Python and all these benchmarks. The problem is on an other
>
> > level. Coding schemes, typography, usage of characters, ... For a
>
> > given coding sch
Le jeudi 30 août 2012 08:55:01 UTC+2, Steven D'Aprano a écrit :
You are right.
But as soon as you introduce artificially a "latin-1"
bottleneck, all this machinery just become useless.
This flexible representation is working absurdly.
It optimizes the characters you are not using (in one
sense)
Le jeudi 30 août 2012 17:01:50 UTC+2, Antoine Pitrou a écrit :
>
>
> I honestly suggest you shut up until you have a clue.
>
Désolé Antoine,
I have not the knowledge to dive in the Python code,
but I know what is a character.
The coding of the characters is a domain per se,
independent from th
Le dimanche 2 septembre 2012 11:07:35 UTC+2, Ian a écrit :
> On Sun, Sep 2, 2012 at 1:36 AM, wrote:
>
> > I still remember my thoughts when I read the PEP 393
>
> > discussion: "this is not logical", "they do no understand
>
> > typography", "atomic character ???", ...
>
>
>
> That would in
Le dimanche 2 septembre 2012 14:01:18 UTC+2, Serhiy Storchaka a écrit :
> On 02.09.12 12:52, Peter Otten wrote:
>
> > Ian Kelly wrote:
>
> >
>
> >> Rewriting the example to use locale.strcoll instead:
>
> >
>
> > sorted(li, key=functools.cmp_to_key(locale.strcoll))
>
> >
>
> > There is a
Le jeudi 13 septembre 2012 23:25:27 UTC+2, Tim Chase a écrit :
> I've got a bunch of text in Portuguese and to transmit them, need to
>
> have them in us-ascii (7-bit). I'd like to keep as much information
>
> as possible, just stripping accents, cedillas, tildes, etc. So
>
> "serviço móvil" b
Le vendredi 14 septembre 2012 22:45:05 UTC+2, Terry Reedy a écrit :
> On 9/14/2012 12:15 PM, wxjmfa...@gmail.com wrote:
>
>
>
> > PS Avoid Py3.3 :-)
>
>
>
> pps Start using 3.3 as soon as possible. It has Python's first fully
>
> portable non-buggy Unicode implementation. The second releas
Le lundi 17 septembre 2012 10:48:30 UTC+2, Laszlo Nagy a écrit :
> Reportlab is on the wall of shame. http://python3wos.appspot.com/
>
>
>
> Is there other ways to create PDF files from python 3? There is pyPdf. I
>
> haven't tried it yet, but it seem that it is a low level library. It
>
> d
Le mardi 18 septembre 2012 11:04:19 UTC+2, Laszlo Nagy a écrit :
> > A big yes and it is very easy. I assume you know how
>
> > to write a plain text file with Python :-).
>
> >
>
> > Use your Python to generate a .tex file and let it compile
>
> > with one of the pdf TeX engines.
>
> >
>
> >
Le mardi 18 septembre 2012 15:31:52 UTC+2, Laszlo Nagy a écrit :
> > I understood, you have Python on a platform and starting
>
> > from this you wish to create pdf files.
>
> > Obviously, embedding "TeX" is practically a no solution,
>
> > although distibuting a portable standalone TeX distribu
I wrote my first program on a PDP-8. I discovered Python
at release 1.5.?
Now years later... I find Python more and more unusable.
As an exemple related to this topic, which summarizes a
little bit the situation. I just opened my interactive
interpreter and produced this:
>>> for i in range(len(
Le mercredi 26 septembre 2012 01:34:01 UTC+2, 8 Dihedral a écrit :
> Grant Edwards於 2012年9月26日星期三UTC+8上午2時25分31秒寫道:
>
> > On 2012-09-25, Martin P. Hellwig wrote:
>
> >
>
> > > On Tuesday, 25 September 2012 09:14:27 UTC+1, Mark Lawrence wrote:
>
> >
>
> > >> Hi all,
>
> >
>
> > >>
>
Le mercredi 26 septembre 2012 09:23:47 UTC+2, Steven D'Aprano a écrit :
> On Tue, 25 Sep 2012 23:35:39 -0700, wxjmfauth wrote:
>
>
>
> > Py 3.3 succeeded to somehow kill unicode and it has been transformed
>
> > into an "American" product for "Ame
Le mercredi 26 septembre 2012 10:35:04 UTC+2, Mark Lawrence a écrit :
> On 26/09/2012 07:35, wxjmfa...@gmail.com wrote:
>
> >
>
> > Py 3.3 succeeded to somehow kill unicode and it has
>
> > been transformed into an "American" product for
>
> > "American" users.
>
> > jmf
>
> >
>
>
>
> Why
Le mercredi 26 septembre 2012 10:13:58 UTC+2, Terry Reedy a écrit :
> On 9/26/2012 2:35 AM, wxjmfa...@gmail.com wrote:
>
>
>
> > Py 3.3 succeeded to somehow kill unicode and it has
>
> > been transformed into an "American" product for
>
> > "American" users.
>
>
>
> Python 3.3 is the first
Le mercredi 26 septembre 2012 11:55:16 UTC+2, Chris Angelico a écrit :
> On Wed, Sep 26, 2012 at 7:31 PM, wrote:
>
> > you are correct. But the price you pay for this is extremely
>
> > high. Now, practically all characters are affected, espacially
>
> > those *in* the Basic *** Multilingual**
I should add that I have not the knowledge to dive
in the Python code. But I "see" what has been done.
As I have a very good understanding of all this
coding of characters stuff, I can just pick up
- in fact select characters or combination
of characters - which I supspect to be problematic
and I s
Le mercredi 26 septembre 2012 16:56:55 UTC+2, Chris Angelico a écrit :
> On Thu, Sep 27, 2012 at 12:50 AM, wrote:
>
> > I just see the results and the facts. For an end
>
> > user, this is the only thing that counts.
>
>
>
> Then what counts is that Python 3.2 (like Javascript) exhibits
>
>
Sorry guys, I'm "only" able to see this
(with the Python versions an end user can
download):
>>> timeit.repeat("('你'*1).replace('你', 'a')")
[31.44532887821319, 31.409585124813844, 31.40705548932476]
>>> timeit.repeat("('你'*1).replace('你', 'a')")
[323.56687741054805, 323.1660997337247, 325
Le mercredi 26 septembre 2012 17:54:04 UTC+2, Ian a écrit :
> On Wed, Sep 26, 2012 at 1:23 AM, Steven D'Aprano
>
> wrote:
>
> > On Tue, 25 Sep 2012 23:35:39 -0700, wxjmfauth wrote:
>
> >
>
> >> Py 3.3 succeeded to somehow kill unicode and it ha
Le mercredi 26 septembre 2012 18:52:44 UTC+2, Paul Rubin a écrit :
> Chris Angelico writes:
>
> > When you compare against a wide build, semantics of 3.2 and 3.3 are
>
> > identical, and then - and ONLY then - can you sanely compare
>
> > performance. And 3.3 stacks up much better.
>
>
>
> I
Le mardi 25 septembre 2012 16:44:05 UTC+2, Jayden a écrit :
> In learning Python, I found there are two types of classes? Which one are
> widely used in new Python code? Is the new-style much better than old-style?
> Thanks!!
Use Python 3 and classes.
---
The interesting point or my ques
This flexible string representation is wrong by design.
Expecting to divide "Unicode" in chunks and to gain something
is an illusion.
It has been created by a computer scientist who thinks "bytes"
when on that field one has to think "bytes" and usage of the
characters at the same time.
The latin-1
Using Python on Windows is a dream.
Python uses and needs the system, but the system does
not use Python.
Every Python version is installed in its own isolated
space, site-packages included and without any defined
environment variable. Every Python can be seen as a
different application.
Knowing
Le jeudi 11 octobre 2012 15:16:33 UTC+2, Ramchandra Apte a écrit :
PS C:\> $cmd="import sys;"
PS C:\> $cmd+="print('\n'.join(sys.path))"
PS C:\> $cmd
import sys;print('\n'.join(sys.path))
PS C:\> c:\python32\python -c $cmd
C:\Windows\system32\python32.zip
c:\python32\DLLs
c:\python32\lib
c:\pytho
Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
> On 10/17/2012 10:31 AM, nwaits wrote:
>
> > I'm very impressed with python's wordlist script for plain text. Is there
> > a script for finding words that do NOT have certain diacritic marks, like
> > acute or grave accents (utf-
Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :
> On Wed, Oct 17, 2012 at 9:32 AM, wrote:
>
> import unicodedata
>
> def HasDiacritics(w):
>
> > ... w_decomposed = unicodedata.normalize('NFKD', w)
>
> > ... return 'no' if len(w) == len(w_decomposed) else 'yes'
>
>
1 - 100 of 328 matches
Mail list logo