Unfortunately the problem is a bit deeper in the ORM. In general, when filtering against a multivalued relation (reverse foreign key or m2m relation) Django uses joins instead of subqueries. This can result in duplicating rows if multiple related rows match the filter. The fix recommended in the docs is to apply .distinct() to the query. Using distinct can result in slow query execution speed, and many DBAs see this kind of usage of distinct as a surefire sign that the query is written poorly (and I agree here).
We really should be using IN/EXISTS queries for filters spanning multivalued relations, but for historical reasons we don't. We do that for exclude and negated filters already. There is also a technical problem which needs to be solved before changing how __isnull=False works. If we want to make __isnull=False to do EXISTS/IN query, then the additional icontains filter in the query qs.filter(somefield__isnull=False, somefield__othercol__icontains='a') must be pushed to the same IN/EXISTS clause. This isn't easy to do correctly (and we don't do it correctly for exclude queries currently). In addition, there are cases where subqueries aren't wanted, older versions of MySQL for example choke on subqueries. So, there are a couple of complications for this idea: - Support for pushing multiple filter conditions to the same subquery is needed even if we consider support for only isnull=False filter. - We should consider using IN/EXISTS + subquery for multivalued relations in filters. This is challenging both technically and from backwards compatibility perspective. - Anssi On Fri, 2014-08-08 at 16:48 +1000, Curtis Maloney wrote: > Can you create a PR, with docs and tests? > > > If not, we'll need to find someone brave enough to delve into the ORM > and add this [quite valid, IMHO] optimisation. > > > -- > C > > > > > On 8 August 2014 11:16, David Butler <[email protected]> wrote: > > > On Thursday, August 7, 2014 6:48:22 PM UTC-5, Tim Graham > wrote: > Does .filter(somefield__isnull=False) not work for > what you're trying to do? > > > > If I do .filter(somefield__isnull=False) it tries to do a LEFT > OUTER JOIN on the table for somefield instead of a WHERE > EXISTS (...) and if that table is very large then it is pretty > slow > > > Perhaps the answer is just to modify what __isnull does > instead of making a new lookup > > > > On Thursday, August 7, 2014 7:43:07 PM UTC-4, David > Butler wrote: > It would be nice if there was some database > level optimization for getting objects that > have related objects that exist. > > > This is a slow filter: > > > qs = [obj for obj in qs if > qs.somefield_set.exists()] > > > Could be faster with something like this: > > > qs = qs.filter(somefield__exists=True) > > > Here is some (rough, probably grossly over > simplified but working) code: > > > Code is also available here if it gets > garbled: http://dpaste.com/0825PNP > > > > > query_template = ( > '{not_part} EXISTS (SELECT 1 FROM > "{table1}" WHERE ' > '"{table1}"."{table1_column}" = > "{table2}"."{table2_column}")' > ) > def > filter_by_reverse_feriegn_key_existance(qs, > **kw): > > > for arg, value in kw.items(): > > > assert > arg.endswith('__exists') > > > if value is True: > not_part = '' > elif value is False: > not_part = 'NOT' > > > Model = qs.model > for field in > arg.rstrip('__exists').split('__'): > field = > Model._meta.get_field_by_name(field)[0] > > qs = > qs.extra(where=[query_template.format( > > table1=field.model._meta.db_table, > > table1_column=field.field.db_column, > > table2=Model._meta.db_table, > > table2_column=Model._meta.pk.db_column, > not_part = > not_part > )]) > > > Model = field.model > return qs > > > > > This works on postgres, and is ~100x faster > > > I would be interested in any comments > -- > You received this message because you are subscribed to the > Google Groups "Django developers" group. > To unsubscribe from this group and stop receiving emails from > it, send an email to django-developers > [email protected]. > To post to this group, send email to > [email protected]. > Visit this group at > http://groups.google.com/group/django-developers. > > To view this discussion on the web visit > > https://groups.google.com/d/msgid/django-developers/d4f2d1bc-326d-43d0-a53c-d2b1c25465cd%40googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Django developers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1407482729.11410.105.camel%40TTY32. For more options, visit https://groups.google.com/d/optout.
