Re: Django Transactions Performance increase

2013-10-29 Thread Apostolos Bessas
Hi Robin,

As far as I can tell, using one transaction should increase the
performance of the process. The reason is that you issue just one
COMMIT for the whole process instead of one per UPDATE. As an added
benefit, it helps with the data integrity.

There are two main ways I know you can improve the performance:
- Use executemany
(http://initd.org/psycopg/docs/cursor.html#cursor.executemany), which
issues O(n) queries, but does the query planning only once. One app
that uses executemany is django-bulk:
https://github.com/KMahoney/django-bulk and
https://github.com/transifex/django-bulk for some updated code.
- or use COPY (for postgresql), which uses three queries, but has
other kinds of overhead. I guess that other RDBMSs provide a similar
functionality.

The COPY method is the following: You create a new table (probably a
temporary one), COPY all entries there and then do an update in one
query. You can find an implementation of COPY for django in
https://github.com/mpessas/django-pg-extensions/blob/master/djangopg/copy.py
(that's mine).

Which one is preferrable depends on the number of queries you have.

Hope this helps a bit,
Apostolis

On Tue, Oct 29, 2013 at 4:39 PM, Robin Fordham  wrote:
> Hi,
>
> I have been doing some reading and am looking to increase the performance of
> updating existing database entries via incoming datafeeds.
>
> I am finding conflicting opinions if wrapping updates in a transaction
> context manager helps improve performance, some sources say it does, others
> say it simply provides data integrity across the queryset within the
> transaction and no performance improvements and others have cited the
> transaction management overhead actually degrades performance;
>
> for instance:
>
> with transaction.commit_on_success()
> for row in updatedata:
> i = item.objects.get(id=row[0])
> i.foo = row[1]
> i.baa = row[2]
> i.save()
>
> for row in updatedata:
> i = item.objects.get(id=row[0])
> i.foo = row[1]
> i.baa = row[2]
> i.save()
>
> Some clarification on this matter would be greatly appreciated. Also any
> pointers to improve my updating efficiency would be appreciated (although I
> know I cannot do a filter and a .update() on the queryset, as each row's
> update data is distinct).
>
> Thanks.
>
> Regards,
>
> Robin.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To post to this group, send email to django-users@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/8dd77008-1715-4c63-9860-d82ce5c65131%40googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAEa3b%2BorXwyc4H_7C6T2Ppb2EE_wNQFKgCSsL2y37iP%2BQo24rw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Complex query reduction

2013-11-04 Thread Apostolos Bessas
On Sat, Nov 2, 2013 at 4:50 PM, Daniele Procida  wrote:
>
> But, the real killer is the combination of ordering (in the queryset or on 
> the model, it doesn't matter) with the distinct() - as soon as one is removed 
> from the equation, the execution time drops to around 250ms.
>
> That's for 55000 BibliographicRecords created by that last operation (before 
> distinct() is applied; distinct() reduces them to 28000).


Do you happen to use PostgreSQL? This could be a case of a
"non-optimal" configuration, that makes PostgreSQL use the disk to do
the sorting. Take a look at
http://www.postgresql.org/docs/current/static/runtime-config-resource.html#GUC-WORK-MEM.

Apostolis

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAEa3b%2BoCFp3oPwtcR-uyoERW3YwHR4aXLubp7WUpa%2BHtYNhvmw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.