2009/12/15 Tracy Reed <tr...@ultraviolet.org>:
>
> I have code which looks basically like this:
>
> now        = datetime.today()
> beginning  = datetime.fromtimestamp(0)
> end        = now - timedelta(days=settings.DAYSTOKEEP)
>
> def purgedb():
>    """Delete archivedEmail objects from the beginning of time until
>    daystokeep days in the past."""
>    queryset   = archivedEmail.objects.all()
>    purgeset   = queryset.filter(received__range=(beginning, end))

You don't need both queries (altghou they are lazy). You could just say:

purgeset = archivedEmail.filter(received__range=(beginning, end))


>    for email in purgeset:
>        print email
>        try:
>            os.unlink(settings.REAVER_CACHE+"texts/%s"     % email.cacheID)
>            os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
>            os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
>        except OSError:
>            pass
>    purgeset.delete()
>
> if __name__ == '__main__':
>    purgedb()
>
> The idea is that we are stuffing a bunch of emails in a database for
> customer service purposes. I want to clear out anything older than
> DAYSTOKEEP. The model looks like this:
>
> class archivedEmail(models.Model):
>    subject     = models.CharField(blank=True, max_length=512, null=True)
>    toAddress   = models.CharField(blank=True, max_length=128, db_index=True)
>    fromAddress = models.CharField(blank=True, max_length=128, db_index=True)
>    date        = models.DateTimeField()
>    received    = models.DateTimeField(db_index=True)
>    crmScore    = models.FloatField()
>    spamStatus  = models.CharField(max_length=6, choices=spamStatusChoices, 
> db_index=True)
>    cacheHost   = models.CharField(max_length=24)
>    cacheID     = models.CharField(max_length=31, primary_key=True)
>
>    class Meta:
>        ordering = ('-received',)
>
> But when purgedb runs it deletes emails 100 at a time (which takes
> forever) and after running for a couple of hours uses a gig and a half
> of RAM. If I let it continue after a number of hours it runs the
> machine out of RAM/swap.
>
> Am I doing something which is not idiomatic or misusing the ORM
> somehow? My understanding is that it should be lazy so using
> objects.all() on queryset and then narrowing it down with a
> queryset.filter() to make a purgeset should be ok, right? What can I
> do to make this run in reasonable time/memory?
>
> PS: I used to have ordering set to -date in the class Meta but that
> caused the db to always put an ORDER BY date on the select query which
> was unnecessary in this case causing it to take ages sorting a couple
> million rows since there is no index on date (nor did there need to
> be, so I thought, since we never select on it). Changing it to
> received makes no difference to my app but avoids creating another
> index. Django's is the first ORM I have ever used and these sneaky
> performance issues are making me wonder...
>
> --
> Tracy Reed
> http://tracyreed.org
>


When you execute a query set that operates on millions of rows you
should remember that ORM will create a python object for each row.
This could take a while. Though i am not sure how it handles memory
issues.

You could issue plain SQL query which returns result as tuple (only
returning cache ID for example) and might be faster if you don't
really need ORM, then loop through the tuple and delete files on disk.
I think this should use less memory and processing time.

Also, I am not SQL guru but i guess BETWEEN should work on most DB
servers so you should not have portability issues (but i cant
guarantee this).

Don't use try ... except for checking if the file exists. Rather use
"if" statement with os.path.exists or os.path.isfile and if it does
exist delete the file. Exceptions are expensive in CPython.

Read this topic about deleting objects in bulk (though i think you
already have it never hurts to refresh your memory).

http://docs.djangoproject.com/en/dev/topics/db/queries/#deleting-objects

Just my 2 cents

Davor

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


Reply via email to