On Thu, 24 Jul 2008 17:19:41 -0400, Wei Hao <[EMAIL PROTECTED]> wrote:
Hi:
I'm pretty new to python and I have some optimization issues. I'll show you
the piece of code which is causing it, with pseudo-code before it and
comments. I'm accessing a gigantic table (like 15 million rows) in SQL.
d is some dictionary, r is a precompiled regex string
Big loop, so I search through the table in chunks given by delta
SQL query ("select * from table where rowID >= n and rowID < (n +
delta)"), result of query stored in a. Each individual row is a[n1], columns
of rows are a[n1][n2].
[snip]
I am 100% sure it's this code snippet that's the cause of my problems.
Here's what I can tell you. Each chunk of rows that I grab is essentially
equal in size (rowID skips over stuff, but rather arbitrarily). The time it
takes to fetch the SQL query doesn't change. But as the program progresses,
this snippet gets slower. Here's the output:
2500 0.441551299341
5000 1.26162739664
7500 2.35092688403
10000 3.48417469666
12500 4.59031305491
15000 5.78972588775
17500 6.28305527139
20000 6.73344570903
22500 8.31732146487
25000 9.65322872159
27500 8.98186042757
30000 11.8042818095
32500 12.1965593712
35000 13.2735763291
37500 14.0282617344
What is it in the code snippet that slows down as n increases? Is there
something about the way low level python functions I don't understand which
is slowing me down?
Perhaps you need an index on rowID.
Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list