Sorry for the double-post, but I just searched the Django Developers group, and found this: "Using EXISTS instead of IN for subqueries" https://groups.google.com/forum/#!searchin/django-developers/WHERE$20EXISTS/django-developers/OG1unUV-MOU/VzkepyuaUDUJ and this: "Support server-side cursors for queryset iteration in database backends" https://code.djangoproject.com/ticket/16614
I don't think this answers my specific question, because mine is more about trying to explicitly invoke the WHERE EXISTS clause instead of automatically replacing IN with WHERE EXISTS, but I thought I should put them here for anyone interested. Thanks. John On Wednesday, February 3, 2016 at 10:03:31 AM UTC-6, John P. wrote: > > Greetings! > > I'm trying to refactor a query to avoid using QuerySet.extra (as per the > recommendation here: > https://docs.djangoproject.com/en/1.9/ref/models/querysets/#extra) > > Here's a simplified version of my code: > > # testapp.models > > class Parent(models.Model): > pass > > class Child(models.Model): > parent = models.ForeignKey('testapp.Parent') > > This is the query I am attempting to replace: > > Parent.objects.extra(where=[""" > NOT EXISTS (SELECT 1 FROM testapp_child WHERE testapp_child.parent_id > = testapp_parent.id) > """]) > > This is extremely similar to, but not the same as > > Parent.objects.exclude(id__in=Child.objects.all().values('parent_id')) > > Both queries should return the same rows, but the biggest difference is > that PostgreSQL's query planner produces completely different plans. The > performance difference can be huge, even for a relatively modest data set. > (I believe my data set has something like 270k "parent" instances and 80k > "child" instances.) > > I tried searching this group as well as the ticket system, and couldn't > find a solution to this exact problem (other than using the alternative > query). > > Thanks for taking the time to read through all of this! I am more than > happy to open a ticket, but I thought I should post here first. > > > More Technical Stuff > > Here's the output of EXPLAIN ANALYZE on my actual models/data. I had to > apply a LIMIT 100 because if I try to return the entire result, the second > one completely hangs my computer. I bolded some relevant bits > > Limit (cost=0.71..9.70 rows=100 width=111) (actual time=0.074..0.483 > rows=100 loops=1) > -> *Merge Anti Join* (cost=0.71..22435.69 rows=249613 width=111) > (actual time=0.072..0.465 rows=100 loops=1) > Merge Cond: (catalogue_product.id = > catalogue_productcategory.product_id)" > -> *Index Scan* using catalogue_product_pkey on > catalogue_product (cost=0.42..16986.70 rows=292339 width=111) (actual > time=0.020..0.285 rows=150 loops=1) > -> *Index Only Scan* using catalogue_productcategory_9bea82de on > catalogue_productcategory (cost=0.29..3861.27 rows=81671 width=4) (actual > time=0.037..0.070 rows=101 loops=1) > Heap Fetches: 0 > Total runtime: > > *0.568 ms*Here's the plan for the second query: > Limit (cost=0.00..234229.95 rows=100 width=111) (actual > time=31.087..1165.022 rows=100 loops=1) > -> *Seq Scan* on catalogue_product (cost=0.00..342373919.34 > rows=146170 width=111) (actual time=31.087..1164.992 rows=100 loops=1) > Filter: (NOT (SubPlan 1)) > Rows Removed by Filter: 5 > SubPlan 1 > -> *Materialize* (cost=0.00..2138.07 rows=81671 width=4) > (actual time=0.001..5.539 rows=80818 loops=105) > -> *Seq Scan* on catalogue_productcategory u0 > (cost=0.00..1409.71 rows=81671 width=4) (actual time=0.005..10.574 > rows=81671 loops=1) > Total runtime: > *1165.255 ms* > As you can see the former is about ~2000 times faster than the latter. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/8301ed07-7c3a-451d-a6df-133317dce5be%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.