from:"Robin St . Clair"

RE: Complex query reduction

2013-11-03 Thread Robin St . Clair

Hi
You are running into typical production Django problems - works fine in 
testing, terrible in production.
Firstly
If you use attributed many-to-manys, create the table outside of Django and use 
raw SQL.Create a primary key on the two things being related (usually larger 
table ﬁrst).
Multi-column indicesTwo choicesA single index that includes both (do this 
outside Django)Two indices, one for each columnThe size will be roughly 
equivalent between the two
Single composite indexsmaller than two indexes.faster to update.accelerates a 
more restricted range of queries.An index on (A, B) does nothing for queries 
just using BBut on the queries it does accelerate, it will be faster.
Two indiceslargerslower to updatewill help on queries on both columns, but not 
as much as a single composite indexwill help accelerate queries on either of 
the columnscan be expressed directly in Django
Helpful Toolpg_stat_activity shows a wealth of useful information - including
which indexes are actually being used
If an index is not being used (or not being used very often) drop itif you are 
surprised that it is not being used, ﬁnd out why
Basic Performance Rules1. Do not iterate over QuerySets2. If you think you have 
to iterate over a QuerySet, see rule #13. If you are absolutely, certainly, 
100% positive that the only possible solution to your problem is iterating over 
a QuerySet, see rule #3
Iteration - A Health Warning• Ignores Django’s lazy-evaluation mechanism and 
copies everything into local memory• Copies data from the database just to 
process it locally• Does ﬁltration or summation in the application that is 
processed more efﬁciently in the database.• Databases are good at this sort of 
stuff let the DB do it
Alternatives to Iteration• QuerySet.update()• cursor.execute(“UPDATE 
reader_hours ...”)• Stored procedures
Take in the code below
How much many objects are in memory at point A?
qs = Orders.objects.all() # There are about 2,500,000 rows in “orders”
for order in qs:order.age_in_days += 1 # POINT A order.save()
Answer  - 2,500,000
Why Is This?
Django does lazy evaluation… (everyone tells me so!)The Django code carefully 
asks for a slice of 100 objectswhich trickles down through lots of really 
convoluted Python to psycopg2which dutifully asks for 100 rows from 
Postgreswhich sends all 2,500,000 over the wire
Solution - Named CursorsThe protocol between the Postgres client and server 
only does partial sends when using named cursorspsycopg2 fully supports named 
cursorsDjango doesn’t use named cursorsTherefore, the ﬁrst time you ask for any 
object in a QuerySet, you get all of themThis is a very good reason not to ask 
for large result sets
INif using Django avoid the IN operation at all costs
If there are potentially more than 15 items in the list, rework the IN as a 
JOIN against whatever the source of the keys is

At some stage you are going to have to get to grips with SQL, might as well be 
sooner than later
R+C
PS some time ago one of my clients was only able to process 4 transactons a 
second against their very complicated DB. They needed to process 70+ 
transactions per second. After 3 days of code analysis we had the transaction 
rate up to 240 per second and so could get rid of the 6 week backlog that had 
developed. Front end coders didn't consider the DBMS to be importantDate: Sun, 
3 Nov 2013 02:37:09 -0800
From: akaar...@gmail.com
To: django-users@googlegroups.com
Subject: Re: Complex query reduction

You should rewrite the query into a form that doesn't require distinct. In 
general, when you see a query that has joins and DISTINCT, that should be an 
alarm bell that something isn't written correctly in the query. Unfortunately 
Django's ORM generates such queries, and that isn't easy to fix as there are 
databases that like DISTINCT more than rewriting the query to use subqueries.

In any case, you should check how to write the query without the need of 
distinct. Something like this should work:

self.items = BibliographicRecord.objects.listable_objects().filter(

authored__researcher__in=researchers,

)

self.items = BibliographicRecord.objects.listable_objects().filter(

pk__in=self.items.values_list('pk')

)

But maybe you can push the __in to deeper into the authored__researches 
lookup...

 - Anssi

On Saturday, November 2, 2013 4:50:42 PM UTC+2, Daniele Procida wrote:On Fri, 
Nov 1, 2013, Javier Guerra Giraldez  wrote:

>have you tried eliminating the second IN relationship?  something like

>

>entities = entity.get_descendants()

>

>items = BibliographicRecord.objects.filter

>(authored__researcher__person__member_of__entity__in=entities).distinct()

Indeed I have, but in that form it takes around 1770ms, compared to around 
1540ms in the original form. What I actually do is:

# breaking apart the queries allows the use of values_lists

entities = self.entity.get_descendants(

include_self=True

).values_list('id', f

Re: Complex query reduction

2013-11-07 Thread Robin St . Clair

Anssi

The last time I checked the use of IN, all the records from the database
in the query were brought back to the workstation, rather than being
processed on the backend and only the results returned to the workstation.

Have there been changes that carry out the entire query on the backend?
What has changed to cause you to prefer the use of the IN statement?

R+C
On 08/11/2013 05:55, akaariai wrote:

On Sunday, November 3, 2013 1:48:07 PM UTC+2, Robin St.Clair wrote:

*IN*

* if using Django avoid the IN operation at all costs

If there are potentially more than 15 items in the list, rework
the IN as a JOIN against whatever the source of the keys is

I don't necessarily agree with everything else said in the post, but
this one is just plain wrong. It is completely OK to use
__in=queryset. In fact, it is recommended in cases where the alternate
is using join + distinct.

- Anssi
--
You received this message because you are subscribed to the Google
Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to django-users+unsubscr...@googlegroups.com.

To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/6b5f9e3a-2e41-47b5-b0d6-94473fe323b8%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Django
users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/BLU0-SMTP357C2324CC6BEC3F975EB4AE2F20%40phx.gbl.
For more options, visit https://groups.google.com/groups/opt_out.

RE: Complex query reduction

Re: Complex query reduction

2 matches

Site Navigation

Mail list logo

Footer information