On Sat, 18 Aug 2012 19:34:50 +0100, MRAB wrote:
> "a" will be stored as 1 byte/codepoint.
>
> Adding "é", it will still be stored as 1 byte/codepoint.
Wrong. It will be 2 bytes, just like it already is in Python 3.2.
I don't know where people are getting this myth that PEP 393 uses Latin-1
int
On Sat, 18 Aug 2012 09:51:37 -0600, Ian Kelly wrote about PEP 393:
> The change does not just benefit ASCII users. It primarily benefits
> anybody using a wide unicode build with strings mostly containing only
> BMP characters.
Just to be clear:
If you have many strings which are *mostly* BMP,
On Sat, 18 Aug 2012 11:05:07 -0700, wxjmfauth wrote:
> As I understand (I think) the undelying mechanism, I can only say, it is
> not a surprise that it happens.
>
> Imagine an editor, I type an "a", internally the text is saved as ascii,
> then I type en "é", the text can only be saved in at lea
On Sat, 18 Aug 2012 11:30:19 -0700, wxjmfauth wrote:
>> > I'm aware of this (and all the blah blah blah you are explaining).
>> > This always the same song. Memory.
>>
>>
>>
>> Exactly. The reason it is always the same song is because it is an
>> important song.
>>
>>
> No offense here. But t
This is a long post. If you don't feel like reading an essay, skip to the
very bottom and read my last few paragraphs, starting with "To recap".
On Sat, 18 Aug 2012 11:26:21 -0700, Paul Rubin wrote:
> Steven D'Aprano writes:
>> (There is an extension to UCS-2, UTF-16, which encodes non-BMP
>>
On Tuesday, July 17, 2012 12:39:53 PM UTC-7, Mark Lawrence wrote:
> I would like to spend more time on this thread, but unfortunately the 44
> ton artic carrying "Java in a Nutshell Volume 1 Part 1 Chapter 1
> Paragraph 1 Sentence 1" has just arrived outside my abode and needs
> unloading :-)
Chris Angelico writes:
> Generally, I'm working with pure ASCII, but port those same algorithms
> to Python and you'll easily be able to read in a file in some known
> encoding and manipulate it as Unicode.
If it's pure ASCII, you can use the bytes or bytearray type.
> It's not so much 'random
On 7/23/2012 11:18 AM, Albert van der Horst wrote:
In article <5006b48a$0$29978$c3e8da3$54964...@news.astraweb.com>,
Steven D'Aprano wrote:
Even with a break, why bother continuing through the body of the function
when you already have the result? When your calculation is done, it's
done, just
On Sun, Aug 19, 2012 at 1:10 PM, Paul Rubin wrote:
> Chris Angelico writes:
>> I don't have a Python example of parsing a huge string, but I've done
>> it in other languages, and when I can depend on indexing being a cheap
>> operation, I'll happily do exactly that.
>
> I'd be interested to know
Chris Angelico writes:
> Sure, four characters isn't a big deal to step through. But it still
> makes indexing and slicing operations O(N) instead of O(1), plus you'd
> have to zark the whole string up to where you want to work.
I know some systems chop the strings into blocks of (say) a few
hund
On 8/18/2012 4:09 PM, Terry Reedy wrote:
print(timeit("c in a", "c = '…'; a = 'a'*1000+c"))
# .6 in 3.2.3, 1.2 in 3.3.0
This does not make sense to me and I will ask about it.
I did ask on pydef list and paraphrased responses include:
1. 'My system gives opposite ratios.'
2. 'With a default
On Sun, Aug 19, 2012 at 12:35 PM, Paul Rubin wrote:
> Chris Angelico writes:
> "asdfqwer"[4:]
>> 'qwer'
>>
>> That's a not uncommon operation when parsing strings or manipulating
>> data. You'd need to completely rework your algorithms to maintain a
>> position somewhere.
>
> Scanning 4 chara
Chris Angelico writes:
"asdfqwer"[4:]
> 'qwer'
>
> That's a not uncommon operation when parsing strings or manipulating
> data. You'd need to completely rework your algorithms to maintain a
> position somewhere.
Scanning 4 characters (or a few dozen, say) to peel off a token in
parsing a UTF
On Saturday, August 18, 2012 5:14:05 PM UTC-5, MRAB wrote:
> On 18/08/2012 21:29, Aaron Brady wrote:
>
> > On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote:
>
> >> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady wrote:
>
> >>
>
> >> > Is there a problem with hacking on the Beta?
On Sun, Aug 19, 2012 at 12:11 PM, Paul Rubin wrote:
> Chris Angelico writes:
>> UTF-8 is highly inefficient for indexing. Given a buffer of (say) a
>> few thousand bytes, how do you locate the 273rd character?
>
> How often do you need to do that, as opposed to traversing the string by
> iteratio
Chris Angelico writes:
> UTF-8 is highly inefficient for indexing. Given a buffer of (say) a
> few thousand bytes, how do you locate the 273rd character?
How often do you need to do that, as opposed to traversing the string by
iteration? Anyway, you could use a rope-like implementation, or an
i
My compliments to John and Chris and to any others who contributed to the
new xlsx capability. This is a most welcome development. Thank you.
Brent
--
http://mail.python.org/mailman/listinfo/python-list
On Sun, Aug 19, 2012 at 4:26 AM, Paul Rubin wrote:
> Can you explain the issue of "breaking surrogate pairs apart" a little
> more? Switching between encodings based on the string contents seems
> silly at first glance. Strings are immutable so I don't understand why
> not use UTF-8 or UTF-16 fo
On 18/08/2012 21:29, Aaron Brady wrote:
On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote:
On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady wrote:
> Is there a problem with hacking on the Beta?
Nope. Hack on the beta, then when the release arrives, rebase your
work onto it. I d
On 18/08/2012 21:22, wxjmfa...@gmail.com wrote:
Le samedi 18 août 2012 20:40:23 UTC+2, rusi a écrit :
On Aug 18, 10:59 pm, Steven D'Aprano wrote:
On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
Is there any reason why non ascii users are somehow penalized compared
to ascii users?
On Sat, Aug 18, 2012 at 11:19 AM, kj wrote:
>
> Basically, I'm looking for a read-only variable (or variables)
> initialized by Python at the start of execution, and from which
> the initial working directory may be read or computed.
>
This will work for Linux and Mac OS X (and maybe Cygwin, but
On Friday, August 17, 2012 4:57:41 PM UTC-5, Chris Angelico wrote:
> On Sat, Aug 18, 2012 at 4:37 AM, Aaron Brady wrote:
>
> > Is there a problem with hacking on the Beta?
>
>
>
> Nope. Hack on the beta, then when the release arrives, rebase your
>
> work onto it. I doubt that anything of thi
Le samedi 18 août 2012 20:40:23 UTC+2, rusi a écrit :
> On Aug 18, 10:59 pm, Steven D'Aprano
> +comp.lang.pyt...@pearwood.info> wrote:
>
> > On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
>
> > > Is there any reason why non ascii users are somehow penalized compared
>
> > > to ascii user
On Aug 18, 12:22 pm, Jussi Piitulainen
wrote:
> Frank Koshti writes:
> > not always placed in HTML, and even in HTML, they may appear in
> > strange places, such as Hello. My specific issue
> > is I need to match, process and replace $foo(x=3), knowing that
> > (x=3) is optional, and the token mig
On 8/18/2012 12:38 PM, wxjmfa...@gmail.com wrote:
Sorry guys, I'm not stupid (I think). I can open IDLE with
Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
always slower. Period.
You have not tried enough tests ;-).
On my Win7-64 system:
from timeit import timeit
print(timeit("
On 18/08/2012 19:40, rusi wrote:
On Aug 18, 10:59 pm, Steven D'Aprano wrote:
On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
Is there any reason why non ascii users are somehow penalized compared
to ascii users?
Of course there is a reason.
If you want to represent 1114111 different ch
On 18/08/2012 19:30, wxjmfa...@gmail.com wrote:
Le samedi 18 août 2012 19:59:18 UTC+2, Steven D'Aprano a écrit :
On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
[...]
The problem with UCS-4 is that every character re
On 18/08/2012 19:26, Paul Rubin wrote:
Steven D'Aprano writes:
(There is an extension to UCS-2, UTF-16, which encodes non-BMP characters
using two code points. This is fragile and doesn't work very well,
because string-handling methods can break the surrogate pairs apart,
leaving you with inval
On Aug 18, 10:59 pm, Steven D'Aprano wrote:
> On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
> > Is there any reason why non ascii users are somehow penalized compared
> > to ascii users?
>
> Of course there is a reason.
>
> If you want to represent 1114111 different characters in a string,
Le samedi 18 août 2012 19:59:18 UTC+2, Steven D'Aprano a écrit :
> On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
>
>
>
> > Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
>
> >> [...]
>
> >> The problem with UCS-4 is that every character requires four bytes.
>
> >> [..
On 18/08/2012 19:05, wxjmfa...@gmail.com wrote:
Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit :
Proof that is acceptable to everybody please, not just yourself.
I cann't, I'm only facing the fact it works slower on my
Windows platform.
As I understand (I think) the undelying
Steven D'Aprano writes:
> (There is an extension to UCS-2, UTF-16, which encodes non-BMP characters
> using two code points. This is fragile and doesn't work very well,
> because string-handling methods can break the surrogate pairs apart,
> leaving you with invalid unicode string. Not good.)
.
Le samedi 18 août 2012 19:28:26 UTC+2, Mark Lawrence a écrit :
>
> Proof that is acceptable to everybody please, not just yourself.
>
>
I cann't, I'm only facing the fact it works slower on my
Windows platform.
As I understand (I think) the undelying mechanism, I
can only say, it is not a surpr
Thank you very much,
I have found a DLL which is designed exactly for us and I use it through
ctypes.
Vojta
On 18.8.2012 15:44, Ramchandra Apte wrote:
> A simple workaround is to use:
> speak = subprocess.Popen("espeak",stdin = subprocess.PIPE)
> speak.stdin.write("Hello world!")
> time.sleep(1)
>
On Sat, 18 Aug 2012 08:07:05 -0700, wxjmfauth wrote:
> Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
>> [...]
>> The problem with UCS-4 is that every character requires four bytes.
>> [...]
>
> I'm aware of this (and all the blah blah blah you are explaining). This
> always the
Dmitry Arsentiev, 15.08.2012 14:49:
> Has anybody already meet the problem like this? -
> AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
>
> When I run scrapy, I get
>
> File "/usr/local/lib/python2.7/site-packages/scrapy/selector/factories.py",
> line 14, in
> libxm
On 8/18/2012 2:18 AM, zmagi...@gmail.com wrote:
Open using File>Open on the Shell
The important question, as I said in my previous post, is *exactly* what
you do in the OpenFile dialog. Some things work, others do not.
And we (Python) have no control.
--
Terry Jan Reedy
--
http://mail.pyth
On Aug 18, 8:34 pm, Grant Edwards wrote:
> On 2012-08-17, rusi wrote:
>
> > I was in a corporate environment for a while. And carried my
> > 'trim&interleave' habits there. And got gently scolded for seeming to
> > hide things!!
>
> I have, rarely, gotten the opposite raction from "corporate e-m
On 18/08/2012 17:38, wxjmfa...@gmail.com wrote:
Sorry guys, I'm not stupid (I think). I can open IDLE with
Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
always slower. Period.
Proof that is acceptable to everybody please, not just yourself.
Now, the reason. I think it is due
On Sun, Aug 19, 2012 at 2:38 AM, wrote:
> Sorry guys, I'm not stupid (I think). I can open IDLE with
> Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
> always slower. Period.
Ah, but what about all those other operations that use strings under
the covers? As mentioned, namespace l
Sorry guys, I'm not stupid (I think). I can open IDLE with
Py 3.2 ou Py 3.3 and compare strings manipulations. Py 3.3 is
always slower. Period.
Now, the reason. I think it is due the "flexible represention".
Deeper reason. The "boss" do not wish to hear from a (pure)
ucs-4/utf-32 "engine" (this h
Steven,
Well done!!!
Regards,
Malcolm
--
http://mail.python.org/mailman/listinfo/python-list
Frank Koshti writes:
> not always placed in HTML, and even in HTML, they may appear in
> strange places, such as Hello. My specific issue
> is I need to match, process and replace $foo(x=3), knowing that
> (x=3) is optional, and the token might appear simply as $foo.
>
> To do this, I decided to
On Aug 18, 11:48 am, Peter Otten <__pete...@web.de> wrote:
> Frank Koshti wrote:
> > I need to match, process and replace $foo(x=3), knowing that (x=3) is
> > optional, and the token might appear simply as $foo.
>
> > To do this, I decided to use:
>
> > re.compile('\$\w*\(?.*?\)').findall(mystring)
2012/8/18 Frank Koshti :
> Hey Steven,
>
> Thank you for the detailed (and well-written) tutorial on this very
> issue. I actually learned a few things! Though, I still have
> unresolved questions.
>
> The reason I don't want to use an XML parser is because the tokens are
> not always placed in HTM
On Sat, Aug 18, 2012 at 9:07 AM, wrote:
> Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
>> [...]
>> The problem with UCS-4 is that every character requires four bytes.
>> [...]
>
> I'm aware of this (and all the blah blah blah you are
> explaining). This always the same song. M
Frank Koshti wrote:
> I need to match, process and replace $foo(x=3), knowing that (x=3) is
> optional, and the token might appear simply as $foo.
>
> To do this, I decided to use:
>
> re.compile('\$\w*\(?.*?\)').findall(mystring)
>
> the issue with this is it doesn't match $foo by itself, and
On Sun, Aug 19, 2012 at 1:07 AM, wrote:
> I'm aware of this (and all the blah blah blah you are
> explaining). This always the same song. Memory.
>
> Let me ask. Is Python an 'american" product for us-users
> or is it a tool for everybody [*]?
> Is there any reason why non ascii users are somehow
On 2012-08-17, rusi wrote:
> I was in a corporate environment for a while. And carried my
> 'trim&interleave' habits there. And got gently scolded for seeming to
> hide things!!
I have, rarely, gotten the opposite raction from "corporate e-mailers"
used to top posting. I got one comment someth
On 18/08/2012 16:07, wxjmfa...@gmail.com wrote:
Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
[...]
The problem with UCS-4 is that every character requires four bytes.
[...]
I'm aware of this (and all the blah blah blah you are
explaining). This always the same song. Memory.
What's the most reliable way for "module code" to determine the
absolute path of the working directory at the start of execution?
(By "module code" I mean code that lives in a file that is not
meant to be run as a script, but rather it is meant to be loaded
as the result of some import statement
(Resending this to the list because I previously sent it only to
Steven by mistake. Also showing off a case where top-posting is
reasonable, since this bit requires no context. :-)
On Sat, Aug 18, 2012 at 1:41 AM, Ian Kelly wrote:
>
> On Aug 17, 2012 10:17 PM, "Steven D'Aprano"
> wrote:
>>
>> U
Le samedi 18 août 2012 14:27:23 UTC+2, Steven D'Aprano a écrit :
> [...]
> The problem with UCS-4 is that every character requires four bytes.
> [...]
I'm aware of this (and all the blah blah blah you are
explaining). This always the same song. Memory.
Let me ask. Is Python an 'american" product
Hey Steven,
Thank you for the detailed (and well-written) tutorial on this very
issue. I actually learned a few things! Though, I still have
unresolved questions.
The reason I don't want to use an XML parser is because the tokens are
not always placed in HTML, and even in HTML, they may appear in
On Fri, 17 Aug 2012 21:41:07 -0700, Frank Koshti wrote:
> Hi,
>
> I'm new to regular expressions. I want to be able to match for tokens
> with all their properties in the following examples. I would appreciate
> some direction on how to proceed.
Others have already given you excellent advice to
I think the point was missed. I don't want to use an XML parser. The
point is to pick up those tokens, and yes I've done my share of RTFM.
This is what I've come up with:
'\$\w*\(?.*?\)'
Which doesn't work well on the above example, which is partly why I
reached out to the group. Can anyone help
On 17 August 2012 18:23, Hans Mulder wrote:
> On 16/08/12 23:34:25, Walter Hurry wrote:
> > On Thu, 16 Aug 2012 17:20:29 -0400, Terry Reedy wrote:
> >
> >> On 8/16/2012 11:40 AM, Ramchandra Apte wrote:
> >>
> >>> Look you are the only person complaining about top-posting.
> >>
> >> No he is not.
Not really. Try modifying ast.literal_eval. This will be quite secure.
On 17 August 2012 19:36, Chris Angelico wrote:
> On Fri, Aug 17, 2012 at 11:28 PM, Eric Frederich
> wrote:
> > Within the debugging console, after importing all of the bindings, there
> > would be no reason to import anythin
A simple workaround is to use:
speak = subprocess.Popen("espeak",stdin = subprocess.PIPE)
speak.stdin.write("Hello world!")
time.sleep(1)
speak.terminate() #end the speaking
On 17 August 2012 21:49, Vojtěch Polášek wrote:
> Hi,
> I am developing audiogame for visually impaired users and I want
Please don't use all caps.
On 17 August 2012 18:16, coldfire wrote:
> I would like to know that where can a python script be stored on-line from
> were it keep running and can be called any time when required using
> internet.
> I have used mechanize module which creates a webbroswer instance to
I am aware of this. I'm just to lazy to use Google Groups! "Come on
Ramchandra, you can switch to Google Groups."
On 17 August 2012 13:09, rusi wrote:
> On Aug 17, 3:36 am, Chris Angelico wrote:
> > On Fri, Aug 17, 2012 at 1:40 AM, Ramchandra Apte
> wrote:
> > > On 16 August 2012 21:00, Mark L
In article
<385e732e-1c02-4dd0-ab12-b92890bbe...@o3g2000yqp.googlegroups.com>,
Frank Koshti wrote:
> I'm new to regular expressions. I want to be able to match for tokens
> with all their properties in the following examples. I would
> appreciate some direction on how to proceed.
>
>
> @foo1
On Sat, 18 Aug 2012 01:09:26 -0700, wxjmfauth wrote:
sys.version
> '3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]'
timeit.timeit("('ab…' * 1000).replace('…', '……')")
> 37.32762490493721
> timeit.timeit("('ab…' * 10).replace('…', 'œ…')") 0.8158757139801764
>
sys
On 18/08/2012 06:42, Chris Angelico wrote:
On Sat, Aug 18, 2012 at 2:41 PM, Frank Koshti wrote:
Hi,
I'm new to regular expressions. I want to be able to match for tokens
with all their properties in the following examples. I would
appreciate some direction on how to proceed.
@foo1
@foo2()
@f
On 18/08/2012 02:44, Steven D'Aprano wrote:
Makes you think that Google is interested in fixing the bugs in their
crappy web apps? They have become as arrogant and as obnoxious as
Microsoft used to be.
Charging off topic again, but I borrowed a book from the local library a
couple of months
>>> sys.version
'3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]'
>>> timeit.timeit("('ab…' * 1000).replace('…', '……')")
37.32762490493721
timeit.timeit("('ab…' * 10).replace('…', 'œ…')")
0.8158757139801764
>>> sys.version
'3.3.0b2 (v3.3.0b2:4972a8f1b2aa, Aug 12 2012, 15:02:36)
66 matches
Mail list logo