On 15 déc, 02:44, Tracy Reed <tr...@ultraviolet.org> wrote:
> I have code which looks basically like this:
>
> now        = datetime.today()
> beginning  = datetime.fromtimestamp(0)
> end        = now - timedelta(days=settings.DAYSTOKEEP)
>
> def purgedb():
>     """Delete archivedEmail objects from the beginning of time until
>     daystokeep days in the past."""
>     queryset   = archivedEmail.objects.all()
>     purgeset   = queryset.filter(received__range=(beginning, end))
>     for email in purgeset:
>         print email
>         try:
>             os.unlink(settings.REAVER_CACHE+"texts/%s"     % email.cacheID)
>             os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
>             os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
>         except OSError:
>             pass
>     purgeset.delete()
>
> if __name__ == '__main__':
>     purgedb()
>
(snip)

> But when purgedb runs it deletes emails 100 at a time (which takes
> forever) and after running for a couple of hours uses a gig and a half
> of RAM. If I let it continue after a number of hours it runs the
> machine out of RAM/swap.

looks like settings.DEBUG=True to me.

> Am I doing something which is not idiomatic or misusing the ORM
> somehow? My understanding is that it should be lazy so using
> objects.all() on queryset and then narrowing it down with a
> queryset.filter() to make a purgeset should be ok, right?

No problem here as long as you don't do anything that forces
evaluation of the queryset. But this is still redundant - you can as
well build the appropriate queryset immediatly.

> What can I
> do to make this run in reasonable time/memory?

Others already commented on checking whether you have settings.DEBUG
set to True - the usual suspects when it comes to RAM issues with
django's ORM.

wrt/ the other mentioned problem - building whole model instances for
each row - you can obviously save a lot of work here by using a
value_list queryset - tuples are very cheap.

Oh, and yes: I/O and filesystem operations are not free neither. This
doesn't solve your pb with the script eating all the RAM, but surely
impacts the overall performances.


Now for something different - here are a couple other python
optimisation tricks:

>     for email in purgeset:
>         print email

Remove this. I/O are not for free. Really.

>         try:
>             os.unlink(settings.REAVER_CACHE+"texts/%s"     % email.cacheID)
>             os.unlink(settings.REAVER_CACHE+"prob_good/%s" % email.cacheID)
>             os.unlink(settings.REAVER_CACHE+"prob_spam/%s" % email.cacheID)
>         except OSError:
>             pass

Move all redundant attribute lookup (os.unlink and
settings.REAVER_CACHE) and string concatenations out of this loop.


def purgedb():
  """Delete archivedEmail objects from the beginning of time until
daystokeep days in the past.
  """
  text_cache = settings.REAVER_CACHE + "texts/%s"
  prob_good_cache = settings.REAVER_CACHE+"prob_good/%s"
  prob_spam_cache = settings.REAVER_CACHE+"prob_spam/%s"
  unlink = os.unlink

  # no reason to put this outside the function.
  now = datetime.today()
  beginning = datetime.fromtimestamp(0)
  end = now - timedelta(days=settings.DAYSTOKEEP)
  qs = archivedEmail.objects.filter(received__range=(beginning, end))

  for row in qs.value_list(cacheID):
    cacheID = row[0]
    try:
      unlink(text_cache % cacheID)
      unlink(prob_good_cache % cacheID)
      unlink(prob_spam_cache % cacheID)
    except OSError:
      pass

  qs.delete()

Oh and yes, one last point : how do you run this script exactly ?

HTH

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


Reply via email to