>On 03/06/2011 03:58, Chris Torek wrote:
>>> -
>> This is a bit surprising, since both "s1 in s2" and re.search()
>> could use a Boyer-Moore-based algorithm for a sufficiently-long
>> fixed string, and the time required should be proportional to that
On Jun 8, 7:38 pm, "ru...@yahoo.com" wrote:
> On 06/07/2011 06:30 PM, Roy Smith wrote:
>
>
>
> > On 06/06/2011 08:33 AM, rusi wrote:
> >>> Evidently for syntactic, implementation and cultural reasons, Perl
> >>> programmers are likely to get (and then overuse) regexes faster than
> >>> python prog
On 06/07/2011 06:30 PM, Roy Smith wrote:
> On 06/06/2011 08:33 AM, rusi wrote:
>>> Evidently for syntactic, implementation and cultural reasons, Perl
>>> programmers are likely to get (and then overuse) regexes faster than
>>> python programmers.
>
> "ru...@yahoo.com" wrote:
>> I don't see how the
On 06/08/2011 03:01 AM, Duncan Booth wrote:
> "ru...@yahoo.com" wrote:
>> On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
>>> Yes, but you have to pay the cost of loading the re engine, even if
>>> it is a one off cost, it's still a cost,
[...]
> At least part of the reason that there's no differen
"ru...@yahoo.com" wrote:
> On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
>> Yes, but you have to pay the cost of loading the re engine, even if
>> it is a one off cost, it's still a cost,
>
> ~$ time python -c 'pass'
> real 0m0.015s
> user 0m0.011s
> sys 0m0.003s
>
> ~$ time python
On Jun 7, 11:37 pm, "ru...@yahoo.com" wrote:
> On 06/06/2011 08:33 AM, rusi wrote:
>
> > For any significant language feature (take recursion for example)
> > there are these issues:
>
> > 1. Ease of reading/skimming (other's) code
> > 2. Ease of writing/designing one's own
> > 3. Learning curve
>
On 06/06/2011 08:33 AM, rusi wrote:
>> Evidently for syntactic, implementation and cultural reasons, Perl
>> programmers are likely to get (and then overuse) regexes faster than
>> python programmers.
"ru...@yahoo.com" wrote:
> I don't see how the different Perl and Python cultures themselves
>
On 06/06/2011 08:33 AM, rusi wrote:
> For any significant language feature (take recursion for example)
> there are these issues:
>
> 1. Ease of reading/skimming (other's) code
> 2. Ease of writing/designing one's own
> 3. Learning curve
> 4. Costs/payoffs (eg efficiency, succinctness) of use
> 5.
On 06/06/2011 09:29 AM, Steven D'Aprano wrote:
> On Sun, 05 Jun 2011 23:03:39 -0700, ru...@yahoo.com wrote:
[...]
> I would argue that the first, non-regex solution is superior, as it
> clearly distinguishes the multiple steps of the solution:
>
> * filter lines that start with "CUSTOMER"
> * extra
On 03/06/2011 03:58, Chris Torek wrote:
-
This is a bit surprising, since both "s1 in s2" and re.search()
could use a Boyer-Moore-based algorithm for a sufficiently-long
fixed string, and the time required should be proportional to that
needed to
On 2011-06-06, Ian Kelly wrote:
> Fair enough, although if you ask me the + 1 is just as magical
> as the + 7 (it's still the length of the string that you're
> searching for). Also, re-finding the opening ' still repeats
> information.
Heh, true. I doesn't really repeat information, though, as i
On Mon, Jun 6, 2011 at 11:48 AM, Ethan Furman wrote:
> I like the readability of this version, but isn't generating an exception on
> every other line going to kill performance?
I timed it on the example data before I posted and found that it was
still 10 times as fast as the regex version. I di
On Mon, Jun 6, 2011 at 11:17 AM, Neil Cerutti wrote:
> I wrestled with using addition like that, and decided against it.
> The 7 is a magic number and repeats/hides information. I wanted
> something like:
>
> prefix = "TABLE='"
> start = line.index(prefix) + len(prefix)
>
> But decided I searc
Ian Kelly wrote:
On Mon, Jun 6, 2011 at 10:08 AM, Neil Cerutti wrote:
import re
print("re solution")
with open("data.txt") as f:
for line in f:
fixed = re.sub(r"(TABLE='\S+)\s+'", r"\1'", line)
print(fixed, end='')
print("non-re solution")
with open("data.txt") as f:
for l
On 2011-06-06, Ian Kelly wrote:
> On Mon, Jun 6, 2011 at 10:08 AM, Neil Cerutti wrote:
>> import re
>>
>> print("re solution")
>> with open("data.txt") as f:
>> ? ?for line in f:
>> ? ? ? ?fixed = re.sub(r"(TABLE='\S+)\s+'", r"\1'", line)
>> ? ? ? ?print(fixed, end='')
>>
>> print("non-re solutio
On Mon, Jun 6, 2011 at 10:08 AM, Neil Cerutti wrote:
> import re
>
> print("re solution")
> with open("data.txt") as f:
> for line in f:
> fixed = re.sub(r"(TABLE='\S+)\s+'", r"\1'", line)
> print(fixed, end='')
>
> print("non-re solution")
> with open("data.txt") as f:
> for l
On 2011-06-06, ru...@yahoo.com wrote:
> On 06/03/2011 02:49 PM, Neil Cerutti wrote:
> Can you find an example or invent one? I simply don't remember
> such problems coming up, but I admit it's possible.
>
> Sure, the response to the OP of this thread.
Here's a recap, along with two candidate solu
On Mon, Jun 6, 2011 at 9:29 AM, Steven D'Aprano
wrote:
> [...]
>> I would expect
>> any regex processor to compile the regex into an FSM.
>
> Flying Spaghetti Monster?
>
> I have been Touched by His Noodly Appendage!!!
Finite State Machine.
--
http://mail.python.org/mailman/listinfo/python-list
On Sun, 05 Jun 2011 23:03:39 -0700, ru...@yahoo.com wrote:
> Thus what starts as
> if line.startswith ('CUSTOMER '):
> try:
> kw, first_initial, last_name, code, rest = line.split(None, 4)
> ...
> often turns into (sometimes before it is written) something like
> m = re.match
For any significant language feature (take recursion for example)
there are these issues:
1. Ease of reading/skimming (other's) code
2. Ease of writing/designing one's own
3. Learning curve
4. Costs/payoffs (eg efficiency, succinctness) of use
5. Debug-ability
I'll start with 3.
When someone of K
On Mon, Jun 6, 2011 at 6:51 PM, Octavian Rasnita wrote:
> It is not so hard to decide whether using RE is a good thing or not.
>
> When the speed is important and every millisecond counts, RE should be used
> only when there is no other faster way, because usually RE is less faster
> than using ot
t;Chris Torek"
Newsgroups: comp.lang.python
To:
Sent: Monday, June 06, 2011 10:11 AM
Subject: Re: how to avoid leading white spaces
In article
ru...@yahoo.com wrote (in part):
[mass snippage]
What I mean is that I see regexes as being an extremely small,
highly restricted, domain speci
In article
ru...@yahoo.com wrote (in part):
[mass snippage]
>What I mean is that I see regexes as being an extremely small,
>highly restricted, domain specific language targeted specifically
>at describing text patterns. Thus they do that job better than
>than trying to describe patterns implici
On 06/03/2011 08:05 PM, Steven D'Aprano wrote:
> On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote:
>
I often find myself changing, for example, a startwith() to a RE when
I realize that the input can contain mixed case
>>>
>>> Why wouldn't you just normalise the case?
>>
>> Becau
On 06/03/2011 03:45 PM, Chris Torek wrote:
>>On 2011-06-03, ru...@yahoo.com wrote:
> [prefers]
>>> re.split ('[ ,]', source)
>
> This is probably not what you want in dealing with
> human-created text:
>
> >>> re.split('[ ,]', 'foo bar, spam,maps')
> ['foo', '', 'bar', '', 'spam', 'map
On 06/03/2011 02:49 PM, Neil Cerutti wrote:
> > On 2011-06-03, ru...@yahoo.com wrote:
or that I have to treat commas as well as spaces as
delimiters.
>>> >>>
>>> >>> source.replace(",", " ").split(" ")
>> >>
>> >> Uhgg. create a whole new string just so you can split it on one
On Jun 3, 7:25 pm, Steven D'Aprano wrote:
> Regarding their syntax, I'd like to point out that even Larry Wall is
> dissatisfied with regex culture in the Perl community:
>
> http://www.perl.com/pub/2002/06/04/apo5.html
This is a very good link.
And it can be a starting point for python to leapf
On Sat, 04 Jun 2011 21:02:32 +0100, Nobody wrote:
> On Sat, 04 Jun 2011 05:14:56 +, Steven D'Aprano wrote:
>
>> This fails to support non-ASCII letters, and you know quite well that
>> having to spell out by hand regexes in both upper and lower (or mixed)
>> case is not support for case-insen
On Sat, 04 Jun 2011 09:39:24 -0400, Roy Smith wrote:
> To be sure, if you explore the edges of the regex syntax space, you can
> write non-portable expressions. You don't even have to get very far out
> to the edge. But, as you say, if you limit yourself to a subset, you
> can write portable one
On Sat, 04 Jun 2011 05:14:56 +, Steven D'Aprano wrote:
> This fails to support non-ASCII letters, and you know quite well that
> having to spell out by hand regexes in both upper and lower (or mixed)
> case is not support for case-insensitive matching. That's why Python's re
> has a case in
On Sat, 04 Jun 2011 13:41:33 +1200, Gregory Ewing wrote:
>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries
>
> But is there any need for the Boyer-Moore algorithm to
> operate on characters?
>
> Seem
The efficiently argument is specious. [This is a python list not a C
or assembly list]
The real issue is that complex regexes are hard to get right -- even
if one is experienced.
This is analogous to the fact that knotty programs can be hard to get
right even for experienced programmers.
The anal
I wrote:
>> Another nice thing about regexes (as compared to string methods) is
>> that they're both portable and serializable. You can use the same
>> regex in Perl, Python, Ruby, PHP, etc.
In article <4de9bf50$0$29996$c3e8da3$54964...@news.astraweb.com>,
Steven D'Aprano wrote:
> Regexes a
On Sat, Jun 4, 2011 at 12:30 PM, Roy Smith wrote:
> Another nice thing about regexes (as compared to string methods) is that
> they're both portable and serializable. You can use the same regex in
> Perl, Python, Ruby, PHP, etc. You can transmit them over a network
> connection to a cooperating
On Fri, 03 Jun 2011 22:30:59 -0400, Roy Smith wrote:
> In article <4de992d7$0$29996$c3e8da3$54964...@news.astraweb.com>,
> Steven D'Aprano wrote:
>
>> Of course, if you include both case-sensitive and insensitive tests in
>> the same calculation, that's a good candidate for a regex... or at
>>
On Sat, 04 Jun 2011 03:24:50 +0100, MRAB wrote:
> [snip]
> Some regex implementations support scoped case sensitivity. :-)
Yes, you should link to your regex library :)
Have you considered the suggested Perl 6 syntax? Much of it looks good to
me.
> I have at times thought that it would be use
In article <4de992d7$0$29996$c3e8da3$54964...@news.astraweb.com>,
Steven D'Aprano wrote:
> Of course, if you include both case-sensitive and insensitive tests in
> the same calculation, that's a good candidate for a regex... or at least
> it would be if regexes supported that :)
Of course the
On 04/06/2011 03:05, Steven D'Aprano wrote:
On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote:
I often find myself changing, for example, a startwith() to a RE when
I realize that the input can contain mixed case
Why wouldn't you just normalise the case?
Because some of the text may
On Fri, 03 Jun 2011 12:29:52 -0700, ru...@yahoo.com wrote:
>>> I often find myself changing, for example, a startwith() to a RE when
>>> I realize that the input can contain mixed case
>>
>> Why wouldn't you just normalise the case?
>
> Because some of the text may be case-sensitive.
Perhaps you
Chris Torek wrote:
Python might be penalized by its use of Unicode here, since a
Boyer-Moore table for a full 16-bit Unicode string would need
65536 entries
But is there any need for the Boyer-Moore algorithm to
operate on characters?
Seems to me you could just as well chop the UTF-16 up
into
On 03/06/2011 23:11, Ethan Furman wrote:
Chris Torek wrote:
On 2011-06-03, ru...@yahoo.com wrote:
[prefers]
re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
>>> re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam', 'maps']
I
Chris Torek wrote:
On 2011-06-03, ru...@yahoo.com wrote:
[prefers]
re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
>>> re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam', 'maps']
I think you've got a typo in th
>On 2011-06-03, ru...@yahoo.com wrote:
[prefers]
>> re.split ('[ ,]', source)
This is probably not what you want in dealing with
human-created text:
>>> re.split('[ ,]', 'foo bar, spam,maps')
['foo', '', 'bar', '', 'spam', 'maps']
Instead, you probably want "a comma followed by zero
On 2011-06-03, ru...@yahoo.com wrote:
>>> or that I have to treat commas as well as spaces as
>>> delimiters.
>>
>> source.replace(",", " ").split(" ")
>
> Uhgg. create a whole new string just so you can split it on one
> rather than two characters? Sorry, but I find
>
> re.split ('[ ,]', sou
On 06/03/2011 08:25 AM, Steven D'Aprano wrote:
> On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote:
>
>> On 06/02/2011 07:21 AM, Neil Cerutti wrote:
>
>>> > Python's str methods, when they're sufficent, are usually more
>>> > efficient.
>>
>> Unfortunately, except for the very simplest case
On 06/03/2011 07:17 AM, Neil Cerutti wrote:
> On 2011-06-03, ru...@yahoo.com wrote:
>> The other tradeoff, applying both to Perl and Python is with
>> maintenance. As mentioned above, even when today's
>> requirements can be solved with some code involving several
>> string functions, indexes, an
On 03 Jun 2011 14:25:53 GMT
Steven D'Aprano wrote:
> source.replace(",", " ").split(" ")
I would do;
source.replace(",", " ").split()
> [steve@sylar ~]$ python -m timeit -s "source = 'a b c,d,e,f,g h i j k'"
What if the string is 'a b c, d, e,f,g h i j k'?
>>> source.replace(",", " ").spli
On Fri, 03 Jun 2011 05:51:18 -0700, ru...@yahoo.com wrote:
> On 06/02/2011 07:21 AM, Neil Cerutti wrote:
>> > Python's str methods, when they're sufficent, are usually more
>> > efficient.
>
> Unfortunately, except for the very simplest cases, they are often not
> sufficient.
Maybe so, but the
On Fri, 03 Jun 2011 02:58:24 +, Chris Torek wrote:
> Python might be penalized by its use of Unicode here, since a
> Boyer-Moore table for a full 16-bit Unicode string would need
> 65536 entries (one per possible ord() value). However, if the
> string being sought is all single-byte values, a
On 2011-06-03, ru...@yahoo.com wrote:
> The other tradeoff, applying both to Perl and Python is with
> maintenance. As mentioned above, even when today's
> requirements can be solved with some code involving several
> string functions, indexes, and conditionals, when those
> requirements change,
On Fri, 03 Jun 2011 04:30:46 +, Chris Torek wrote:
>>I'm not sure what you mean by "full 16-bit Unicode string"? Isn't
>>unicode inherently 32 bit?
>
> Well, not exactly. As I understand it, Python is normally built
> with a 16-bit "unicode character" type though
It's normally 32-bit on p
On 06/02/2011 07:21 AM, Neil Cerutti wrote:
> > On 2011-06-01, ru...@yahoo.com wrote:
>> >> For some odd reason (perhaps because they are used a lot in
>> >> Perl), this groups seems to have a great aversion to regular
>> >> expressions. Too bad because this is a typical problem where
>> >> their
* Roy Smith (Thu, 02 Jun 2011 21:57:16 -0400)
> In article <94ph22frh...@mid.individual.net>,
> Neil Cerutti wrote:
> > On 2011-06-01, ru...@yahoo.com wrote:
> > > For some odd reason (perhaps because they are used a lot in
> > > Perl), this groups seems to have a great aversion to regular
> > >
>In article ,
> Chris Torek wrote:
>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries (one per possible ord() value).
In article
Roy Smith wrote:
>I'm not sure what you mean by "full 16-bit Unicode
On Fri, Jun 3, 2011 at 1:52 PM, Chris Angelico wrote:
> However, Unicode planes 0-2 have all
> the defined printable characters
PS. I'm fully aware that there are ranges defined in page 14 / E.
They're non-printing characters, and unlikely to be part of a text
string, although it is possible. So
On Fri, Jun 3, 2011 at 1:44 PM, Roy Smith wrote:
> In article ,
> Chris Torek wrote:
>
>> Python might be penalized by its use of Unicode here, since a
>> Boyer-Moore table for a full 16-bit Unicode string would need
>> 65536 entries (one per possible ord() value).
>
> I'm not sure what you mean
In article ,
Chris Torek wrote:
> Python might be penalized by its use of Unicode here, since a
> Boyer-Moore table for a full 16-bit Unicode string would need
> 65536 entries (one per possible ord() value).
I'm not sure what you mean by "full 16-bit Unicode string"? Isn't
unicode inherently
>In article <94ph22frh...@mid.individual.net>
> Neil Cerutti wrote:
>> Python's str methods, when they're sufficent, are usually more
>> efficient.
In article
Roy Smith replied:
>I was all set to say, "prove it!" when I decided to try an experiment.
>Much to my surprise, for at least one com
On 03/06/2011 02:57, Roy Smith wrote:
In article<94ph22frh...@mid.individual.net>,
Neil Cerutti wrote:
On 2011-06-01, ru...@yahoo.com wrote:
For some odd reason (perhaps because they are used a lot in
Perl), this groups seems to have a great aversion to regular
expressions. Too bad because
In article <94ph22frh...@mid.individual.net>,
Neil Cerutti wrote:
> On 2011-06-01, ru...@yahoo.com wrote:
> > For some odd reason (perhaps because they are used a lot in
> > Perl), this groups seems to have a great aversion to regular
> > expressions. Too bad because this is a typical problem w
On 2011-06-01, ru...@yahoo.com wrote:
> For some odd reason (perhaps because they are used a lot in
> Perl), this groups seems to have a great aversion to regular
> expressions. Too bad because this is a typical problem where
> their use is the best solution.
Python's str methods, when they're su
On 06/01/2011 09:39 PM, ru...@yahoo.com wrote:
On Jun 1, 11:11 am, Chris Rebert wrote:
On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar
Hi
i have a file which contains data
//ACCDJ EXEC DB2UNLDC,DFLID=&DFLID,PARMLIB=&PARMLIB,
// UNLDSYST=&UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ
On Jun 1, 11:11 am, Chris Rebert wrote:
> On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar
> > Hi
> >
> > i have a file which contains data
> >
> > //ACCDJ EXEC DB2UNLDC,DFLID=&DFLID,PARMLIB=&PARMLIB,
> > // UNLDSYST=&UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ '
> > //ACCT
On Wed, Jun 1, 2011 at 12:31 AM, rakesh kumar
wrote:
>
> Hi
>
> i have a file which contains data
>
> //ACCDJ EXEC DB2UNLDC,DFLID=&DFLID,PARMLIB=&PARMLIB,
> // UNLDSYST=&UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ '
> //ACCT EXEC DB2UNLDC,DFLID=&DFLID,PARMLIB=&PARMLIB,
>
Hi
i have a file which contains data
//ACCDJ EXEC DB2UNLDC,DFLID=&DFLID,PARMLIB=&PARMLIB,
// UNLDSYST=&UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCDJ '
//ACCT EXEC DB2UNLDC,DFLID=&DFLID,PARMLIB=&PARMLIB,
// UNLDSYST=&UNLDSYST,DATABAS=MBQV1D0A,TABLE='ACCT'
//
65 matches
Mail list logo