sure.  i'll make a patch soon...

thanks for the input!

cfh

On 10/20/12 13:29 , Massimo Di Pierro wrote:
I meant to skip count.

On Saturday, 20 October 2012 15:28:56 UTC-5, Massimo Di Pierro wrote:

How about adding a gae only parameter to the gae adapter_args that tells
it to skip fetch?

On Saturday, 20 October 2012 11:25:51 UTC-5, howesc wrote:

It appears that the most efficient way to delete on app engine is to:
  - build a query object, like we are doing now
  - call run with keys_only=True (
https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_run)
which returns an iterator.
  - pass that iterator to the datastore delete method (
https://developers.google.com/appengine/docs/python/datastore/functions#delete
)

this avoids the cost of loading the rows into memory, decreases the
likelihood of timeout, and has the cost of 1 datastore small operation per
row.  but it does prevent us from getting a count of rows deleted.

the way we do it now:
  - run count() on the query.  this has a cost (time and money) of
iterating over all the rows that match the query on GAE (1 datastore small
operation per row)
  - run fetch(limit=1000) and call delete() successively until no more
rows.  this has the cost of running a full query (at least 1 datastore read
operation per row) and loading the result set into memory and then deleting
the results.

in my case i'm timing out on the count() call so i don't even start the
delete.  from an efficiency standpoint i'd rather have more rows deleted
for less cost then get a count....but this may not be acceptable for all.
  at a minimum i think we should switch to use keys_only=True for the fetch,
and skip the leading count() call and just sum the number of times we call
fetch.  we may also consider catching the datastore timeout error and
trying to handle a partial delete more gracefully (or continue to let the
user catch the error).

what is the "right" approach for web2py?  if the approach with count is
correct, could i propose a gae bulk_delete method that does not return
count but uses my first method?

thanks for the input!

cfh

On Saturday, October 20, 2012 7:58:56 AM UTC-7, Massimo Di Pierro wrote:

Delete should return the number of deleted records. What is your
proposal?

On Wednesday, 17 October 2012 17:30:22 UTC-5, howesc wrote:

Hi all,

I'm trying to clean up old expired sessions.....but i waited a long
time to get to this and now my GAE delete is just timing out.  Reading the
GAE docs, there appears to be some improvements that we can make to the
query delete method on GAE that will make it faster and cheaper.  what we
lose then is the count of the number of rows deleted.

my question is, does having a db(db.table.something==True).delete()
that does not return a count break the web2py API contract, or break
anyone's applications?

thanks,

christian




--



Reply via email to