On Sat, 2003-08-30 at 02:21, Jacob Anawalt wrote: > Ron Johnson wrote: > > >But, of course, that's not an issue in The Clearly Superior Language, > >is it? > > > > > Ok, if this thread has accomplished little else, it seems to have gotten > a couple people, including myself to play around with Python. > > I have a simple little perl program at work. It parses a mailbox format > file and extracts email addresses from it. The mailbox file is a bunch > of bounced email notifications. The program uses a couple regular > expressions to extract the data and a hash to hold a unique list of the > emails. It then connects to a database to see if those people have > already been flagged as having bad emails. > > I made a copy of this script, removed the database stuff, and stripped > it down to the basic process of read the file, find emails excluding > some admin type addresses and then print out the unique bounced > addresses. Then I wrote a Python version. It didn't take very long to > code, about the same time to write the perl code if I take out the time > reading howto's and the language reference. [snip] > real 0m0.056s > user 0m0.051s > sys 0m0.004s [snip] > real 0m0.839s > user 0m0.818s > sys 0m0.008s [snip] > What did I do wrong to make the python code take over ten times as long?
Sure, use perl's #1 optimized, built-in feature for your test case ;-) I think, that in this case it is probably safe to say "who cares?" do you really care about .06 seconds vs .8 seconds? is your real data large enough for the difference to matter?. (if it is, btw, you might want to use a while loop and 'line = fi.readline()' instead of putting the whole file in memory) No one has said (that I know of) that python is the fastest language ever. The only thing that I have heard is that it is "fast enough" and that the benefits outweigh the addition (.8 seconds of) run time. In some cases it might not, you might not like it, you might have code that already works just fine, right tool for the right job and all that. in terms of a very simple optimization though, try adding this just before the "email_mo = email_pattern.search(line)" line: if not '@' in line: continue this prevents every line from having regex used when there is no chance of it matching. It also cut the runtime for my test case (a 1.3 meg mailbox) from .380 to .117 or if you want to really cheat, replace you file open and main for loop with this: """ lines = os.popen(r"egrep '[EMAIL PROTECTED]' bademail.mai").read() matches = email_pattern.findall(lines) for match in matches: if not ignore_pattern.search(match): badHash[match.lower()] = 'bad' """ yes, it _is_ evil, but it works ;-) real 0m0.069s user 0m0.060s sys 0m0.010s -Mark -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]