I posted a ticket here: https://code.djangoproject.com/ticket/26194 which was marked as a duplicate of https://code.djangoproject.com/ticket/25789#comment:2
It was suggested that I try Parent.objects.filter(child=None).distinct() (but that it might perform worse than the WHERE NOT EXISTS query) Here's the plan generated by that query: HashAggregate (cost=216.11..216.54 rows=43 width=103) (actual time=1.553..1.562 rows=25 loops=1) -> *Nested Loop Anti Join* (cost=0.42..215.15 rows=43 width=103) (actual time=0.015..1.511 rows=25 loops=1) -> *Seq Scan* on catalogue_category (cost=0.00..104.45 rows=145 width=103) (actual time=0.006..0.154 rows=146 loops=1) -> *Index Only* Scan using catalogue_productcategory_b583a629 on catalogue_productcategory (cost=0.42..37.27 rows=801 width=4) (actual time=0.009..0.009 rows=1 loops=146) Index Cond: (category_id = catalogue_category.id) Heap Fetches: 87 Total runtime: *1.619 ms* This is only 3 times as slow as the carefully optimized query, which might be an acceptable performance trade-off in some cases. On Wednesday, February 3, 2016 at 10:03:31 AM UTC-6, John P. wrote: > > Greetings! > > I'm trying to refactor a query to avoid using QuerySet.extra (as per the > recommendation here: > https://docs.djangoproject.com/en/1.9/ref/models/querysets/#extra) > > Here's a simplified version of my code: > > # testapp.models > > class Parent(models.Model): > pass > > class Child(models.Model): > parent = models.ForeignKey('testapp.Parent') > > This is the query I am attempting to replace: > > Parent.objects.extra(where=[""" > NOT EXISTS (SELECT 1 FROM testapp_child WHERE testapp_child.parent_id > = testapp_parent.id) > """]) > > This is extremely similar to, but not the same as > > Parent.objects.exclude(id__in=Child.objects.all().values('parent_id')) > > Both queries should return the same rows, but the biggest difference is > that PostgreSQL's query planner produces completely different plans. The > performance difference can be huge, even for a relatively modest data set. > (I believe my data set has something like 270k "parent" instances and 80k > "child" instances.) > > I tried searching this group as well as the ticket system, and couldn't > find a solution to this exact problem (other than using the alternative > query). > > Thanks for taking the time to read through all of this! I am more than > happy to open a ticket, but I thought I should post here first. > > > More Technical Stuff > > Here's the output of EXPLAIN ANALYZE on my actual models/data. I had to > apply a LIMIT 100 because if I try to return the entire result, the second > one completely hangs my computer. I bolded some relevant bits > > Limit (cost=0.71..9.70 rows=100 width=111) (actual time=0.074..0.483 > rows=100 loops=1) > -> *Merge Anti Join* (cost=0.71..22435.69 rows=249613 width=111) > (actual time=0.072..0.465 rows=100 loops=1) > Merge Cond: (catalogue_product.id = > catalogue_productcategory.product_id)" > -> *Index Scan* using catalogue_product_pkey on > catalogue_product (cost=0.42..16986.70 rows=292339 width=111) (actual > time=0.020..0.285 rows=150 loops=1) > -> *Index Only Scan* using catalogue_productcategory_9bea82de on > catalogue_productcategory (cost=0.29..3861.27 rows=81671 width=4) (actual > time=0.037..0.070 rows=101 loops=1) > Heap Fetches: 0 > Total runtime: > > *0.568 ms*Here's the plan for the second query: > Limit (cost=0.00..234229.95 rows=100 width=111) (actual > time=31.087..1165.022 rows=100 loops=1) > -> *Seq Scan* on catalogue_product (cost=0.00..342373919.34 > rows=146170 width=111) (actual time=31.087..1164.992 rows=100 loops=1) > Filter: (NOT (SubPlan 1)) > Rows Removed by Filter: 5 > SubPlan 1 > -> *Materialize* (cost=0.00..2138.07 rows=81671 width=4) > (actual time=0.001..5.539 rows=80818 loops=105) > -> *Seq Scan* on catalogue_productcategory u0 > (cost=0.00..1409.71 rows=81671 width=4) (actual time=0.005..10.574 > rows=81671 loops=1) > Total runtime: > *1165.255 ms* > As you can see the former is about ~2000 times faster than the latter. > > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/ec4ca1dd-0136-42c4-a949-9db1145b47e4%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.