Re: how not to run out of memory in cursor.execute

2006-06-05 Thread amberite

[EMAIL PROTECTED] wrote:
> I am using cx_Oracle and MySQLdb to pull a lot of data from some tables
> and I find that the cursor.execute method uses a lot of memory that
> never gets garbage collected. Using fetchmany instead of fetchall does
> not seem to make any difference, since it's the execute that uses
> memory. Breaking the query down to build lots of small tables doesn't
> help, since execute doesn't give its memory back, after reading enough
> small tables execute returns a memory error. What is the trick to get
> memory back from execute in cx_Oracle and MySQLdb?

cx_Oracle and MySQLdb must be handled differently, due to the fact that
MySQL does not actually have cursors (MySQLdb fakes them for you).

To handle large resultsets efficiently in cx_Oracle simply use the
cursor iteration idiom:

for row in cursor:
# do stuff with the row

cx_Oracle takes care of the fetching for you, and your memory usage
should remain fairly constant when using this idiom.

To handle large resultsets in MySQLdb, you have to resort to multiple
queries:

l = 1000
o = 0

cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()
while len(rows) > 0:
# process the fetched rows
o += l
cursor.execute('SELECT foo FROM Bar LIMIT %d OFFSET %d', (l, o))
rows = cursor.fetchall()

cursor.close()

As you can see, the MySQLdb version is more involved, due to the lack
of real cursor support in the MySQL database.  Any database with good
cursor support will likely have good cursor iteration support in the
corresponding DBAPI driver.

Hope this helps,

L. Daniel Burr

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how not to run out of memory in cursor.execute

2006-06-06 Thread amberite

Steve Holden wrote:
>
> The MySQLdb solution you give is way more complicated than it needs to
> be, thereby skewing your opinion towards cx_Oracle unnecessarily.
>

I respectfully disagree with your assertion here.  The code I presented
for MySQLdb is what you *have* to do, to avoid using up too much
memory.  This thread is about how to avoid running out of memory when
handling large resultsets, and the approach I outlined is really the
only way to do that with MySQLdb's DBAPI support.

> Look up the .fetchmany() method of cursors in the DB API. There is only
> any need to execute a single query no matter how large the result set:
> you simply need to keep calling .fetchmany(N) (where N is whatever
> you've decided by testing is your optimum chunk size) until it returns
> less than N rows, at which point you have exhausted the query.
>

.fetchmany() buys you nothing in the case of large resultsets.  The
memory usage will continue to climb with each call to fetchmany.  This
isn't the fault of MySQLdb, it is just that MySQL doesn't support
cursors, so MySQLdb has to fake it.

> It's very little more effort to wrap this all up as a generator that
> effectively allows you to use the same solution as you quote for cx_Oracle.
>

Again, I respectfully disagree.  Your proposed solution, while
resulting in shorter code, will continue to eat memory until the entire
resultset has been delivered.

L. Daniel Burr

-- 
http://mail.python.org/mailman/listinfo/python-list